Reading Tone from the Signal: Featural Acoustic Coding for Tone Resolution in N'Ko Speech Recognition

Full HTML reader

Read the full artifact

Extracted abstract or opening context

\begin{abstract} Script-native automatic speech recognition for the Manding languages, written in N'Ko (\code{U+07C0--U+07FF}), reaches a meaningful error regime, but the strongest released decoder is \emph{toneless}: it emits N'Ko consonants and vowels and drops the tone diacritics that are lexically and grammatically contrastive in Manding. Tone is therefore restored downstream, and the planned mechanism is a text-context language model that infers tone from orthographic context alone. We make three claims that reframe this problem as an \emph{acoustic} one. First, we correct the tone inventory: N'Ko has seven combining tone marks (\code{U+07EB--U+07F1}), short and long forms of high, low, rising, and a native descending (falling) tone at \code{U+07EE}; a widely reused syllable codebook had mislabeled the long marks as length, which propagated into downstream tooling. Second, we measure an empirical tone prior from a corpus of 105 transcribed N'Ko lesson frames (12{,}541 N'Ko characters, 3{,}316 tone marks, harvested by vision-language OCR): \textbf{65.8\%} of syllables carry marked high/low register, \textbf{33.3\%} are unmarked mid, and only \textbf{0.9\%} are contour tones, so tone resolution is, to first order, a three-way non-contour register classification. Third, we establish that text-only tone resolution is weak: a context model on the same corpus reaches a \tder{} (tone-diacritic error rate) of \textbf{50.8\%}, i.e.\ text alone misses roughly half of all tones, leaving large headroom for acoustic evidence that the toneless recognizer discards. We then define \emph{Featural Acoustic Coding} (FAC), a featural treatment of the N'Ko syllable in which the tone mark is a register-plus-contour primitive read directly from the fundamental frequency, and we place tone restoration as a conservative, governance-gated correction in an \emph{Anticipation Geometry Partition} (AGP) layer that reuses the recognizer's own trajectory geometry. We give the architecture, the reproducibility structure (the core claims run standalone, without the acoustic model), a preregistered fusion evaluation, and an honest account of what remains, principally read-speech ground truth for the acoustic tone-diacritic error rate. A secondary section relates FAC to Lexical Acoustic Coding and reports that the featural pitch advantage over a lexical carrier is real but conditional on tonal density; the durable advantage is token efficiency, not reconstruction fidelity. \end{abstract} N'Ko is a script created in 1949 by Solomana Kant\'e for the tonal Manding language family (Bambara, Dioula, Maninka), spoken by tens of millions of people in West Africa, and encoded in Unicode at \code{U+07C0--U+07FF} \citep{unicode_nko}. Unlike the Latin orthography of Bambara, N'Ko was engine

Promotion decision

What has to happen next

Compile/render the source, verify references and figures, then add to the curated atlas.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.