The Model That Listened in N'Ko

Full HTML reader

Read the full artifact

Extracted abstract or opening context

Do not make the system hear Bambara, write Latin, and then ask another model to recover N'Ko afterward. That route repeats the exact problem the brain scan found: the generic model is weak in N'Ko, so a post-hoc restoration step can sound fluent while losing the script's evidence. There was a practical problem first. There was no large N'Ko-labeled speech corpus waiting to be downloaded. Existing Bambara automatic speech recognition, or ASR, data was mostly Latin. To train a direct N'Ko decoder, the project needed targets in N'Ko. IPA means International Phonetic Alphabet. The Latin-to-IPA stage handles Bambara digraphs and vowel forms. The IPA-to-N'Ko stage maps the sound inventory into N'Ko characters. This is where the script argument became concrete. The bridge did not just move letters around. It exposed every place where Latin Bambara made the model guess. "ny" has to be consumed as a single palatal nasal, not as "n" plus "y." "ng" has to be treated as the velar nasal. Toned vowels need Unicode decomposition before lookup. Extended IPA symbols from real transcriptions needed explicit handling. Right-to-left rendering needed marks so the output would not visually scramble in Latin-dominant environments. These were engineering bugs, but they were also linguistic evidence. Each bug was a place where script choice changed what the machine had to infer.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.