From Dead Circuits to Living Speech: Activation Profiling, Script-Native Architecture Search, and Finite-State Phonotactics for N'Ko Automatic Speech Recognition

Full HTML reader

Read the full artifact

Extracted abstract or opening context

\nko{} is an alphabetic script serving over forty million Manding-language speakers across West Africa, engineered by Solomana Kant\'e in 1949 with a strict one-to-one phoneme-to-character mapping, explicit tonal diacritics, and zero spelling exceptions. We present a dual-thread investigation into why large language models fail on \nko{} and how to construct audio-to-\nko{} speech recognition that bypasses such models entirely. In the diagnostic thread, we perform activation profiling of Qwen2-72B-Instruct (4-bit NF4, A100 80\,GB) processing one hundred parallel English/\nko{} sentence pairs across all eighty-one transformer layers, revealing a $2.90\times$ translation tax measured by L2 norm ratio, thirty to sixty percent entropy inflation, an 85.8\% kurtosis deficit at the output layer, and 150\% higher sparsity at embedding. Circuit duplication analysis spanning fifty-five configurations under the Revisit Your Shoulders methodology shows zero \nko{}-advantageous configurations; the best \nko{} score of 0.067 barely exceeds the random baseline of 0.05. Three-stage LoRA fine-tuning comprising 17,360 continued pre-training examples, 21,240 supervised fine-tuning pairs, and 25,100 BPE-aware training instances reduces the translation tax to $0.70\times$, constituting a seventy-six percent reduction. In the constructive thread, we build the first audio-to-\nko{} automatic speech recognition system. A frozen Whisper large-v3 encoder feeds a character-level CTC decoder, and a twenty-eight-rule architecture search over BiLSTM and Transformer variants converges on a 46.9\,M-parameter Transformer with four-fold temporal downsampling, achieving 33\% character error rate and 70\% word error rate on thirty-seven hours of Bambara speech from the bam-asr-early corpus (CC-BY-4.0). A four-state finite-state machine encoding \nko{} syllable phonotactics guarantees one hundred percent structural validity at negligible runtime cost. Total compute expenditure for both research threads is fourteen United States dollars.

Promotion decision

What has to happen next

Compile/render the source, verify references and figures, then add to the curated atlas.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.