Back to corpus
working paperpreprint render candidatescore 100

Script Invisibility Is Structural: Activation Profiling Across Three LLM Families

A prior study demonstrated that Qwen3-8B processes N'Ko text with severely diminished neural activation compared to English, a phenomenon termed \emph{script invisibility}. That finding left an open question: is the deficit specific to one model, or is it a structural property of all models trained on corpora where N'Ko is absent? We answer this by performing identical activation profiling---per-layer extraction of L2 norm, Shannon entropy, sparsity, and kurtosis---on three architecturally distinct models: Qwen3-8B

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

A prior study demonstrated that Qwen3-8B processes N'Ko text with severely diminished neural activation compared to English, a phenomenon termed \emph{script invisibility}. That finding left an open question: is the deficit specific to one model, or is it a structural property of all models trained on corpora where N'Ko is absent? We answer this by performing identical activation profiling---per-layer extraction of L2 norm, Shannon entropy, sparsity, and kurtosis---on three architecturally distinct models: Qwen3-8B (37 layers, Qwen architecture), Qwen2.5-7B (29 layers, previous-generation Qwen), and Mistral-7B (33 layers, Mistral architecture). All three process the same 100 parallel English/N'Ko sentence pairs. Every model exhibits the same failure signature. The average translation tax (ratio of English to N'Ko L2 norm) is \textbf{3.30$\times$} for Qwen3-8B, \textbf{3.59$\times$} for Qwen2.5-7B, and \textbf{2.67$\times$} for Mistral-7B. N'Ko activations are 66\%--72\% weaker than English across all architectures. Embedding-layer sparsity is 2.2--2.6$\times$ higher for N'Ko in both Qwen models. Output-layer kurtosis deficit ranges from 64.6\% (Mistral) to 93.5\% (Qwen2.5), indicating that no model has learned specialized circuits for N'Ko processing. Entropy inflation of 0.78--1.22 bits confirms that N'Ko activations are diffuse rather than structured across all three architectures. The consistency of these results across different model families, training pipelines, tokenizers, and companies establishes that script invisibility is a consequence of training data composition, not architectural design. We discuss implications for the 50+ scripts in Unicode that share N'Ko's data-poverty profile and argue that architectural innovation cannot substitute for representative training data. Total compute cost for all three scans: under \$5. All code, data, and results are publicly available.

Promotion decision

What has to happen next

Compile/render the source, verify references and figures, then add to the curated atlas.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.