N'Ko as Computational Infrastructure: Script-Native Speech Recognition, a Phonemically Interpretable Error Metric, and Admissible Tone Correction

Full HTML reader

Read the full artifact

Extracted abstract or opening context

\begin{abstract} This manuscript consolidates a multi-paper research program on \nko{}, Manding automatic speech recognition, script visibility in large language models, and trajectory-conditioned decoding. The central argument is that \nko{} should not be treated as a decorative or interchangeable rendering of Manding language. For machine-learning systems it functions as computational infrastructure: it determines what tokenizers can represent, what hidden circuits are available, how acoustic evidence is aligned to symbols, whether reported error rates measure speech recognition or merely agreement with an inherited orthographic convention, and how tone itself can be encoded and reconstructed from acoustic evidence. The paper integrates five written project papers and subsequent audit notes into a single canonical account. The representation studies show that current LLM families accept \nko{} Unicode strings while internally underrepresenting the script through inflated translation cost, weak activation geometry, entropy gaps, sparsity inflation, kurtosis deficits, and poor circuit-duplication yield. The speech papers show a progression from early CTC systems to frozen-Whisper-feature decoders and then to a trajectory-conditioned Transformer CTC decoder. In the canonical architecture, 1280-dimensional Whisper large-v3 features are projected into a 768-dimensional decoder space, downsampled temporally, and decoded by a six-layer Transformer CTC head. The anticipatory component computes a seven-dimensional trajectory state $z_t$ for each timestep--commitment, uncertainty, transition pressure, recovery margin, phase stiffness, novelty, and stability--and injects it as an attention-logit bias $B_{ij}^{(m)}$ before CTC emission. The mathematical claim is a measurement claim, not an automatic leaderboard guarantee. I formalize a transparent-script proposition: if a normalized script map $f_N:\Phi\rightarrow\Sigma_N$ is bijective over the target phoneme inventory, then character edit distance over $f_N(\phi_{1:U})$ preserves the phoneme-edit structure up to explicitly modeled normalization choices. A Latin transcription relation with variable-length digraphs, optional tone marking, and spelling variation does not have the same property. This makes \nko{} CER a more phonemically interpretable metric for Manding ASR than Latin WER, even though it is not a perfect phoneme error rate and still depends on normalization, reference quality, and tone/diacritic policy. The strongest retained ASR artifact is an archived checkpoint trained on a \corpusn{}-pair Bambara corpus snapshot, with a 232,476/29,060/29,060 train/validation/test split, learning rate 0.0003, batch size 32, dropout 0.1, seed 42, and reported test \cer{} of \anchorcer{}. This is the canonical

Promotion decision

What has to happen next

Compile/render the source, verify references and figures, then add to the curated atlas.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.