Grand Diomande Research · Full HTML Reader

N'Ko as an Extensible Phonemic Substrate

> Drafted 2026-06-01 from Mohamed's questions: > If N'Ko can mechanically represent missing sounds by composition, what does that imply? > Do we still need ASR retraining? Can English/French be converted into N'Ko labels? > Can phrase-level expression transfer ride on top of the same substrate?

Language as Infrastructure proposal experiment writeup candidate score 44 .md

Full Public Reader

N'Ko as an Extensible Phonemic Substrate

> Drafted 2026-06-01 from Mohamed's questions:
> If N'Ko can mechanically represent missing sounds by composition, what does that imply?
> Do we still need ASR retraining? Can English/French be converted into N'Ko labels?
> Can phrase-level expression transfer ride on top of the same substrate?

The Thesis

The paper should not be framed as "we built an N'Ko ASR." That is too small.

The stronger thesis is:

N'Ko can serve as an extensible phonemic substrate for speech systems: a mechanically
auditable representation layer where sounds from multiple languages can be encoded by
documented N'Ko diacritics and bounded composition, then used as the target for ASR,
governed correction, and self-training.

In plain terms: N'Ko is not just an output script. It can become the intermediate
sound-code that lets a low-resource system manufacture cleaner data, compare outputs
phonemically, and grow adapters without needing a giant pretraining corpus.

What We Have Proven So Far

1. Representation Coverage Works

We added a coverage evaluator:

`[home]/Desktop/NKo/scripts/evaluate_phoneme_coverage.py`

It separates three layers:

1. Baseline: what the current `IPA_TO_NKO` table already supports.
2. Unicode extensions: documented N'Ko foreign-sound combinations from the Unicode
Core Specification, Chapter 19, Table 19-3.
3. Full compositional layer: internal computational encodings for sounds not covered
by baseline or Unicode-documented combinations.

Latest gate:

text
language   layer                   covered  coverage
english    baseline                24/41      58.5%
english    unicode_extensions      30/41      73.2%
english    full_compositional      41/41     100.0%

french     baseline                23/36      63.9%
french     unicode_extensions      29/36      80.6%
french     full_compositional      36/36     100.0%

manding    baseline                27/27     100.0%
manding    full_compositional      27/27     100.0%

The important claim: we can push representational coverage past 90
training.
That is a rulebook/transliteration result, not a neural result.

2. Unicode Already Gives Us Part of the Extension Mechanism

This is not arbitrary symbol invention. Unicode documents N'Ko diacritic behavior for
foreign sounds, including combinations for sounds such as French `[y]`, `[ə]`, `[ʀ]`,
and foreign consonants like `[v]`, `[θ]`, `[ʃ]`, `[ʒ]`, `[x]`, `[q]`.

That means the substrate claim has a real standards anchor:

  • N'Ko has native combining tone marks.
  • N'Ko has a nasalization mark.
  • N'Ko has a double-dot mechanism.
  • Unicode documents combinations for non-Manding sounds.

Our internal layer should be framed as computational extension of a documented N'Ko
principle
, not as proposed official orthographic reform.

3. The Current ASR/Correction Stack Validates the Need for Governance

The AGP pilot validates the architecture's safety story:

text
ASR baseline CER:        0.3106
Old raw proposer CER:    0.4701  (+15.94pp, catastrophic)
Old AGP-gated CER:       0.3120  (+0.14pp, mostly neutralized)

The gate protected the transcript from an over-generative N'Ko corrector.

The min-edit LoRA experiment did not close the loop:

text
Min-edit raw proposer:   +11.63pp
Min-edit gated:          +0.50pp
Accepted-worse:          69/500

Interpretation: the min-edit model learned to make smaller edits, but many were wrong,
so the size gate admitted bad small edits. Loop-v1 failed in a useful way: the next gate
must judge correctness/evidence, not just edit size.

4. Oracle Headroom Is Real

The full oracle run proves the gate has large upside if proposals become trustworthy:

text
29,060 rows
oracle cap=2 accepted:       3,302
oracle rejected:             25,758
edit_too_large rejections:   22,914
median rejected edit size:   9
p90 rejected edit size:      26
accepted-worse:              0

With relative gating disabled, the full oracle sweep shows large reachable headroom:

text
cap=2      -0.46pp
cap=8      -5.57pp
cap=12     -9.97pp
cap=999    -29.15pp

But real proposer sweeps go the opposite way:

text
cap=2      +0.14pp
cap=8      +2.36pp
cap=12     +3.99pp
cap=999    +15.59pp

Conclusion: the bottleneck is proposal correctness, not representational coverage.

What This Means for Retraining

There are three different tasks. They should not be collapsed.

Layer A: Writing / Encoding

Does this require retraining? No.

If the input is text or IPA, N'Ko compositional encoding is deterministic:

text
English/French text -> phonemes/IPA -> N'Ko compositional encoding

This is enough to build datasets, labels, coverage reports, round-trip tests, and
phonemic metrics.

Layer B: Hearing / ASR

Does this require retraining? Usually yes.

If we want direct speech recognition:

text
audio -> N'Ko phonemic output

then the model needs training or adaptation on audio paired with N'Ko labels.

But the labels do not need to be manually written in N'Ko. For English or French, we can
generate them:

text
audio + transcript
  -> phonemizer
  -> IPA
  -> N'Ko compositional target
  -> ASR fine-tune target

So the data requirement is not "find English-N'Ko pairs." It is:

Find audio with transcripts, then generate N'Ko phonemic labels mechanically.

That is a major practical advantage.

Layer C: Featural Hearing

Could this avoid retraining? Maybe, but only if the acoustic model is actually
featural.

If the model predicts whole phoneme symbols, it will not reliably hear sounds it never
trained on. A Manding-trained phoneme model cannot magically hear English `[θ]` because
we invented a symbol for it.

But if the model predicts features:

text
voicing + place + manner + vowel height + rounding + nasality + tone

then some unseen phonemes may become zero-shot compositions of known features. Example:

text
v = labial + fricative + voiced

The model may have heard labial, fricative, and voiced separately even if it never heard
`v` as a whole. This is the deeper reason FAC matters. FAC is not just tone. It is the
path from a symbol-compositional script to a feature-compositional acoustic model.

This is a hypothesis, not yet proven. It needs a dedicated experiment.

The English Experiment

The clean English experiment has two versions.

Experiment 1: Indirect N'Ko ASR

Use existing English ASR as the hearing layer:

text
English audio
  -> existing ASR transcript
  -> English phonemizer
  -> IPA
  -> N'Ko compositional encoding

This proves N'Ko can function as a downstream phonemic representation layer. It does not
prove direct acoustic recognition into N'Ko.

Metrics:

  • phoneme inventory coverage;
  • N'Ko encoding validity;
  • round-trip recoverability from N'Ko -> IPA;
  • ambiguity rate;
  • CER/phoneme error between generated N'Ko and reference phoneme labels.

Experiment 2: Direct N'Ko ASR Fine-Tune

Generate N'Ko labels from an English speech corpus, then fine-tune an ASR model:

text
audio -> N'Ko compositional label

Possible data sources:

  • LibriSpeech for English;
  • Common Voice for English/French;
  • any aligned speech/transcript corpus.

Training set construction:

text
audio.wav
transcript.txt
phonemizer(transcript) -> IPA
IPA_TO_NKO_EXTENDED(IPA) -> target_nko.txt
train ASR on audio.wav -> target_nko.txt

This tests whether the representation can become a model target, not just a conversion
output.

Pass criteria:

  • direct model emits valid N'Ko sequences;
  • N'Ko phoneme error rate beats a naive transcript-to-N'Ko baseline under noisy audio;
  • errors cluster around acoustically hard distinctions, not around script coverage;
  • no catastrophic degradation from the added compositional symbols.

The French Experiment

French is the better cross-lingual stress test than English because its gaps are
structured:

  • nasal vowels: `[ɑ̃]`, `[ɛ̃]`, `[ɔ̃]`, `[œ̃]`;
  • front-rounded vowels: `[y]`, `[ø]`, `[œ]`;
  • French R: `[ʁ]`;
  • glide `[ɥ]`.

This gives the paper a cleaner figure:

text
baseline N'Ko        -> misses the French-specific vowel system
Unicode extensions   -> recovers part of it
full composition     -> recovers all tested French phonemes

The French result is important because it shows the substrate does not fail randomly.
It fails exactly where the target language has feature combinations outside the Manding
inventory, and those combinations can be filled by rules.

Phrase-Level Transfer

The phrase idea is real, but it belongs above the phoneme layer.

There are three levels:

text
sound        -> N'Ko phonemic substrate
word/phrase  -> language-specific lexicon and morphology
meaning      -> semantic/cultural expression layer

N'Ko can preserve the sound. It does not automatically preserve the meaning.

But once every phrase has a stable phonemic representation, we can attach semantic
metadata:

text
source phrase
phonemic N'Ko encoding
literal gloss
intended meaning
cultural/pragmatic note
target-language expression

This is where borrowed phrases become powerful. A phrase from French, English, Bambara,
Arabic, or another language can be represented in N'Ko phonetically, while a semantic
adapter learns how that phrase is used.

That gives us a two-channel bridge:

1. Sound channel: how it is pronounced.
2. Meaning channel: what it does culturally or pragmatically.

This is bigger than transliteration. It becomes a phrase memory system.

Paper Angle

Working title:

N'Ko as an Extensible Phonemic Substrate for Governed Low-Resource Speech Systems

Short thesis:

For low-resource speech, the problem is not only lack of data. It is lack of a
trustworthy representation layer. N'Ko provides a mechanically auditable phonemic
substrate; governance turns model errors into safe training data; together they form a
data flywheel.

Contribution List

1. Formal representation layer: N'Ko as a phonemic substrate with baseline,
Unicode-extension, and internal-compositional tiers.
2. Coverage evaluator: a reproducible gate showing Manding, French, and English
inventory coverage, with a 90
3. Governed correction result: ungoverned N'Ko proposer harms transcripts; AGP
neutralizes most damage.
4. Oracle headroom analysis: large CER gains are possible if proposals are
trustworthy.
5. Self-training path: rejected/would-improve rows become training data, but loop-v1
shows correctness gating is needed.
6. Cross-lingual extension path: English/French ASR targets can be generated from
ordinary audio-transcript corpora using phonemizer -> IPA -> N'Ko composition.
7. Phrase-layer vision: N'Ko as the sound channel for phrase transfer, with semantic
metadata layered on top.

Table of Contents

1. Introduction: the data desert and the representation problem.
2. N'Ko as computational infrastructure.
3. Unicode-supported and compositional extension of N'Ko.
4. Coverage evaluation across Manding, French, and English.
5. From representation to ASR labels.
6. Direct ASR into N'Ko: experiment design.
7. Governed correction: AGP and the harm/neutralization result.
8. Why loop-v1 failed: small wrong edits and the need for correctness gating.
9. Featural acoustic coding: how no-retraining hearing could become possible.
10. Phrase transfer: sound channel plus meaning channel.
11. Limitations and governance.
12. Conclusion: N'Ko as substrate, not endpoint.

What To Do Next

Immediate

1. Promote the coverage evaluator into the NKo test suite with a 90
2. Export the coverage table as a paper artifact.
3. Implement an extended `IPA_TO_NKO` mapping behind an explicit experimental flag.

ASR Proof

4. Build an English mini-dataset:
- audio;
- transcript;
- phonemizer output;
- N'Ko compositional label.
5. Train/fine-tune a small ASR head to emit N'Ko labels.
6. Compare:
- English ASR -> phonemizer -> N'Ko;
- direct audio -> N'Ko;
- oracle transcript -> N'Ko.

Correction Loop Proof

7. Add a correctness/evidence gate beyond edit size:
- acoustic agreement;
- n-best support;
- confusion-pair support;
- proposal logprob margin;
- reject small unsupported edits.
8. Re-run the 500-row N'Ko correction loop.
9. Only run the full 29k when the 500-row loop shows negative CER delta.

The Honest Boundary

This is a breakthrough if framed correctly:

Breakthrough proven now: N'Ko can be extended into a high-coverage phonemic
representation layer across tested inventories.

Breakthrough not yet proven: a Manding-trained acoustic model can hear English/French
sounds zero-shot.

Next proof: generate N'Ko labels from an English/French speech corpus and train direct
audio -> N'Ko. That is the bridge from representation breakthrough to ASR breakthrough.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-brain-scanner/NKO-PHONEMIC-SUBSTRATE-PROPOSAL.md

Detected Structure

Introduction · Method · Evaluation · Figures · Code Anchors · Architecture