Mohamed Diomande

Full HTML reader

Read the full artifact

Extracted abstract or opening context

% =================================================================== % TITLE PAGE % =================================================================== \begin{center} {\LARGE\bfseries From Dead Circuits to Living Speech:\\[0.3em] Activation Profiling, Script-Native Architecture Search,\\[0.3em] and Finite-State Phonotactics for \nko{} Automatic Speech Recognition} {\large Mohamed Diomande}\\[0.3em] {\normalsize Independent Researcher}\\[0.2em] {\normalsize\texttt{[email]}} % =================================================================== % ABSTRACT % =================================================================== \begin{quote} \noindent\textbf{Abstract.}\quad \nko{} is an alphabetic script serving over forty million Manding-language speakers across West Africa, engineered by Solomana Kant\'e in 1949 with a strict one-to-one phoneme-to-character mapping, explicit tonal diacritics, and zero spelling exceptions. We present a dual-thread investigation into why large language models fail on \nko{} and how to construct audio-to-\nko{} speech recognition that bypasses such models entirely. In the diagnostic thread, we perform activation profiling of Qwen2-72B-Instruct (4-bit NF4, A100 80\,GB) processing one hundred parallel English/\nko{} sentence pairs across all eighty-one transformer layers, revealing a $2.90\times$ translation tax measured by L2 norm ratio, thirty to sixty percent entropy inflation, an 85.8\% kurtosis deficit at the output layer, and 150\% higher sparsity at embedding. Circuit duplication analysis spanning fifty-five configurations under the Revisit Your Shoulders methodology shows zero \nko{}-advantageous configurations; the best \nko{} score of 0.067 barely exceeds the random baseline of 0.05. Three-stage LoRA fine-tuning comprising 17,360 continued pre-training examples, 21,240 supervised fine-tuning pairs, and 25,100 BPE-aware training instances reduces the translation tax to $0.70\times$, constituting a seventy-six percent reduction. In the constructive thread, we build the first audio-to-\nko{} automatic speech recognition system. A frozen Whisper large-v3 encoder feeds a character-level CTC decoder, and a twenty-eight-rule architecture search over BiLSTM and Transformer variants converges on a 46.9\,M-parameter Transformer with four-fold temporal downsampling, achieving 33\% character error rate and 70\% word error rate on thirty-seven hours of Bambara speech from the bam-asr-early corpus (CC-BY-4.0). A four-state finite-state machine encoding \nko{} syllable phonotactics guarantees one hundred percent structural validity at negligible runtime cost. Total compute expenditure for both research threads is fourteen United States dollars. \end{quote} % =================================================================== %

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.