Grand Diomande Research · Full HTML Reader

The Script Machines Can't Read

I brain-scanned three AI models. All of them are blind to my family's writing system. Here's what I found, why it matters, and what I built to fix it.

Language as Infrastructure research note experiment writeup candidate score 30 .md

Full Public Reader

The Script Machines Can't Read

I brain-scanned three AI models. All of them are blind to my family's writing system. Here's what I found, why it matters, and what I built to fix it.

---

My family speaks Manding. Bambara, Maninka, depending on which side and which country. Over 40 million people speak these languages across West Africa. Guinea, Mali, Cote d'Ivoire, Senegal, Burkina Faso, The Gambia, and diaspora communities everywhere.

In 1949, a man named Solomana Kante sat down in Kankan, Guinea, and designed a writing system for us. Not borrowed from Arabic, not adapted from French colonial Latin. Built from scratch. Character by character.

He called it N'Ko. In every Manding language, that means "I say."

Kante made one rule that no other major writing system follows: every sound gets exactly one character. Every character represents exactly one sound. No exceptions. No silent letters. No "ough" that could be pronounced six different ways. Tone is marked explicitly. If you can hear it, you can write it. If you can see it, you can say it.

That was 77 years ago. Today, N'Ko has its own Unicode block, its own Wikipedia (6,000+ articles), its own keyboard apps, and millions of literate users.

No major AI system can read it.

What I did

I built something I started calling a brain scanner. The idea is simple: take a powerful AI model, feed it the same sentences in English and N'Ko, and measure what happens inside the model at every layer of processing.

Modern language models have dozens of layers stacked on top of each other. The first few layers read the raw characters. The middle layers think about what they mean. The final layers produce a response. By extracting the internal state at each layer, you can see exactly where things go right and where they go wrong.

I ran this scan on three different models from three different companies: Qwen3-8B (Alibaba), Qwen2.5-7B (Alibaba, previous generation), and Mistral-7B (Mistral AI). Same 100 sentences in both languages. Same metrics. Same method. The only variable was the model.

Total cost: under $5. Everything ran on consumer hardware.

What I found

Every model has the same blind spot.

![Activation profiles: English vs N'Ko across 36 layers](images/activation_comparison.png)
*Per-layer activation profiles for English (blue) and N'Ko (orange). The model processes N'Ko at roughly 30

Four numbers tell the story:

L2 norm measures how loudly each layer is processing. For English, the signal is strong from start to finish. For N'Ko, the model is whispering. Across all three models, N'Ko activations are 63-72

Shannon entropy measures how organized the processing is. English activations are structured and focused. N'Ko activations are diffuse, spread uniformly across all neurons because the model hasn't learned which ones matter. The model is confused.

Sparsity measures how many neurons are just off. At the embedding layer, where the model first encounters the text, over twice as many neurons shut down for N'Ko input compared to English. The model is losing information before it even starts thinking.

Kurtosis measures circuit specialization. High kurtosis means specific neurons fire strongly for specific patterns. The model knows what it's looking at. At the output layer, English kurtosis is 601. N'Ko kurtosis is 132. That's a 78

Here's the cross-model comparison:

Model	Translation Tax	Embedding Sparsity (N'Ko/EN)	Output Kurtosis Deficit
Qwen3-8B	3.30x	2.21x	78.1
Qwen2.5-7B	3.59x	2.59x	93.5
Mistral-7B	2.67x	1.15x	64.6

Three different architectures. Three different companies. Three different training pipelines. Same result.

Why all three fail the same way

The answer is boring and important: training data.

Qwen3-8B has a vocabulary of 151,936 entries. Arabic, another right-to-left script, has over 4,200 dedicated entries. Words, syllables, common phrases.

N'Ko has 32. Just the individual characters. No words. No subwords. Nothing learned. Every N'Ko word has to be spelled out character by character, as if you had to read English one letter at a time without ever learning that "the" is a word.

This isn't a technology problem. The reasoning circuits in these models are perfectly capable. They work for any language they've been trained on. The issue is that N'Ko is statistically invisible in every major training dataset. The Unicode block U+07C0 through U+07FF is a rounding error.

I tested whether newer models naturally fix this. They don't. Qwen3-8B (2025) is only marginally better than Qwen2.5-7B (2024). The gap between N'Ko and English isn't closing. The rising tide is not lifting all boats.

What I built

Diagnosing the problem was step one. Step two was building a system that actually works.

I built the first audio-to-N'Ko automatic speech recognition system. A frozen Whisper encoder feeds a character-level CTC decoder trained on 37 hours of Bambara speech. 28 architecture configurations tested. The production system: 46.9 million parameters, 29.4

For comparison, the only other Bambara ASR system (MALIBA-AI, Latin script output) uses approximately 2 billion parameters and achieves 45.73

The reason the smaller model can compete is N'Ko itself. Kante's 1:1 phoneme-to-character mapping eliminates the ambiguity that Latin orthography forces the decoder to learn. The CTC decoder needs to learn 36 clean output classes instead of navigating digraphs like "ny" (is that one sound or two?) and "ng" (velar nasal or sequence?). I proved this formally: given identical model capacity and training data, a bijective script like N'Ko yields lower character error rate than a many-to-many script like Latin Bambara. The proof is from the structure of the CTC loss function.

A 4-state finite-state machine guarantees that every output sequence is a valid N'Ko syllable. 100

The three-zone failure and the fix

The brain scan revealed a three-zone failure pattern that appeared identically across all three models:

Zone 1 (embedding layers): The model can't read N'Ko characters. The vectors are sparse, poorly differentiated, and disconnected from downstream circuits.

Zone 2 (reasoning layers): The reasoning infrastructure exists and works perfectly for English. For N'Ko, the input signal is too weak and noisy. You can't reason about something you can't read.

Zone 3 (output layers): With no coherent reasoning to synthesize, the model produces near-random text. Kurtosis collapses. The model is guessing.

The fix targets Zone 1. A three-stage LoRA pipeline (continued pre-training, supervised fine-tuning, BPE-aware training) with a total of 63,700 training examples reduces the translation tax from 2.94x to 0.70x. That's a 76

The reasoning circuits were always there. They just needed an on-ramp.

The research

This work is documented in four research papers, all publicly available with code and data:

Paper 1: Dead Circuits -- Activation profiling methodology, single-model deep dive on Qwen3-8B, three-zone failure analysis, LoRA correction pipeline.

Paper 2: Living Speech -- First N'Ko ASR system, 28-architecture search, V1-V4 progression, cross-script bridge, FSM phonotactic validator.

Paper 3: Script Invisibility Is Structural -- Cross-model validation on three architectures (Qwen3-8B, Qwen2.5-7B, Mistral-7B), proving the deficit is universal and data-driven.

Paper 4: Does Script Design Matter? -- Formal proof of phonetic transparency advantage for CTC decoding, architectural evidence, cross-system comparison.

All code, scan data, training pairs, and experiment results: [github.com/Diomandeee/nko-brain-scanner](https://github.com/Diomandeee/nko-brain-scanner)

The formal proofs (5 theorems, 3 derivations) are in a companion document covering CTC loss gradients, translation tax bounds, FSM completeness, circuit death conditions, and LoRA rank-efficiency bounds.

What I'm working on next

Two more experiments are in progress:

Experiment B is the controlled test of whether N'Ko's script design gives it a measurable advantage over Latin Bambara for speech recognition. Same audio, same model architecture, two decoders, one outputting N'Ko characters and one outputting Latin characters. Training is running right now on two Mac Minis.

Experiment C takes a cognitive twin (a language model fine-tuned on one person's conversation patterns) and retrains it on the same data translated to N'Ko. The question: does the twin behave differently when its internal representation uses your mother tongue's script instead of English?

Two more papers will follow from these experiments. Six papers total. $16 total compute. One researcher. One script.

Why this matters beyond N'Ko

N'Ko is not the only invisible script. Adlam (Fulani, ~40 million speakers), Tifinagh (Berber, ~30 million speakers), Vai, Osmanya, and dozens of others share the same data-poverty profile. The combined population affected by script invisibility likely exceeds 200 million people.

The AI revolution promises to democratize access to information, education, and professional tools. That promise has a blind spot the size of West Africa. Not because the technology can't handle these scripts. Because nobody included them in the training data.

Kante designed N'Ko so that West Africans could write their own languages with precision. He couldn't have known that 77 years later, his design would turn out to be computationally optimal for machine learning. The 1:1 phoneme mapping. The explicit tone marks. The exception-free syllable structure. Every one of these is a computational advantage waiting to be activated.

The activation just requires data. And data gaps can be filled.

---

Mohamed Diomande is an independent researcher. This work was conducted entirely on consumer hardware (Apple Silicon) at a total compute cost of $19. All code, data, and papers are open source.

N'Ko (ߒߞߏ) means "I say" in all Manding languages. These papers are the evidence for what Solomana Kante said in 1949: this script was built to work.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-brain-scanner/blog/substack/01-the-script-machines-cant-read.md

Detected Structure

Method · Evaluation · Figures · Architecture