N'Ko as an Extensible Phonemic Substrate

Full HTML reader

Read the full artifact

Extracted abstract or opening context

> Drafted 2026-06-01 from Mohamed's questions: > If N'Ko can mechanically represent missing sounds by composition, what does that imply? > Do we still need ASR retraining? Can English/French be converted into N'Ko labels? > Can phrase-level expression transfer ride on top of the same substrate? **N'Ko can serve as an extensible phonemic substrate for speech systems: a mechanically auditable representation layer where sounds from multiple languages can be encoded by documented N'Ko diacritics and bounded composition, then used as the target for ASR, governed correction, and self-training.** In plain terms: N'Ko is not just an output script. It can become the intermediate sound-code that lets a low-resource system manufacture cleaner data, compare outputs phonemically, and grow adapters without needing a giant pretraining corpus. 1. **Baseline**: what the current `IPA_TO_NKO` table already supports. 2. **Unicode extensions**: documented N'Ko foreign-sound combinations from the Unicode Core Specification, Chapter 19, Table 19-3. 3. **Full compositional layer**: internal computational encodings for sounds not covered by baseline or Unicode-documented combinations. The important claim: **we can push representational coverage past 90% without model training.** That is a rulebook/transliteration result, not a neural result.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.