N'Ko as an Extensible Phonemic Substrate for Governed Low-Resource Speech

Full HTML reader

Read the full artifact

Extracted abstract or opening context

Low-resource speech systems usually fail twice: first because there is not enough audio/text data, and second because the available evaluation scripts do not preserve the phonemic structure of the language being measured. This paper argues that N'Ko offers a different path. Because N'Ko is a phonetic, right-to-left script designed for Manding languages and equipped with tone, nasalization, and documented foreign-sound diacritics, it can function as an extensible phonemic substrate: a deterministic sound-code for constructing labels, auditing errors, and governing self-correction. We validate the representation layer with a coverage evaluator over Manding, French, and English phoneme inventories. Baseline N'Ko covers Manding completely and covers 63.9% of a French inventory and 58.5% of an English inventory. Unicode-documented foreign-sound combinations raise French to 80.6% and English to 73.2%. A bounded full-compositional layer reaches 100.0% on all three tested inventories, passing a 90% coverage gate without model training. We then connect this representation result to a governed correction experiment on 500 N'Ko ASR rows. An ungoverned Gemma-based proposer degrades CER from 0.3106 to 0.4701 (+15.94pp), while the AGP gate reduces the damage to 0.3120 (+0.14pp). A minimal-edit LoRA reduces blind harm but fails after gating (0.3156, +0.50pp) because small wrong edits slip through an edit-size gate. The resulting conclusion is precise: N'Ko representation coverage can be extended mechanically, but direct audio recognition and self-improving correction still require acoustic evidence and correctness-aware governance.

Promotion decision

What has to happen next

Convert into the standard paper schema, add citations, and render a draft PDF.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.