Grand Diomande Research · Full HTML Reader

The N'Ko Paper — Strategy, Angle, and Outline

> Written 2026-06-01, after the session that ran the AGP oracle/real benchmark, the > budget two-regime proof, and launched the minimal-edit retrain. This is the paper > the evidence actually supports — not the four-paper measurement split, and not a > tone paper. It is the one that closes the loop.

Language as Infrastructure proposal experiment writeup candidate score 22 .md

Full Public Reader

The N'Ko Paper — Strategy, Angle, and Outline

> Written 2026-06-01, after the session that ran the AGP oracle/real benchmark, the
> budget two-regime proof, and launched the minimal-edit retrain. This is the paper
> the evidence actually supports — not the four-paper measurement split, and not a
> tone paper. It is the one that closes the loop.

## The angle (Mohamed's own framing)
"It went beyond ASR because we're limited with data and we need to figure out how we
can make more." That is the thesis. The paper is NOT "we built an N'Ko ASR." It is:

A self-improving speech system that manufactures its own training data, governed by
a single trajectory-uncertainty geometry, for a language that has no corpus to train on.

ASR is the entry point. The contribution is the LOOP: decode → govern → correct →
recycle-as-data → retrain. The whole point is data generation under governance.

## Why this is the right paper (3 reasons)
1. It subsumes everything we built. ASR anchor, z_t trajectory state, AGP gate,
FAC/tone, SFT recycler — they stop being 5 contributions and become 5 STAGES of one
machine. The four-paper measurement split presents N'Ko as a measuring stick; this
presents it as the substrate of a self-improving model. Bigger, truer, one story.
2. The low-resource constraint is the engine, not a caveat. High-resource: train a
big LM on a giant corpus, done. N'Ko CAN'T (the data desert is real — ~500h public
audio, no large toned text). So the system is FORCED to grow its own prior from its
own ASR output + a small reference set + acceptance rules. The constraint is what
makes the architecture necessary and novel.
3. We have the experiments, and they're honest. Most ran THIS session on artifacts
already on disk. The killer result is a true one (below).

## The central empirical result (the spine)
On 500 real rows (Gemma corrector, mac5):
- ASR baseline CER = 0.3106.
- Ungoverned corrector (blind accept) = 0.4701 = +15.94pp WORSE. An LLM emitting
valid N'Ko ACTIVELY DESTROYS a low-resource transcript.
- AGP-gated = 0.3120 = +0.14pp. Governance neutralizes 99
- Budget two-regime proof: with TRUSTWORTHY (oracle) proposals, raising the edit budget
yields up to -26.12pp CER with ZERO accepted-worse; with the real corrector the
same lever makes it worse. → The bottleneck is provably PROPOSER QUALITY, not the
gate or the budget. The headroom is real and reachable.
- loop-v1 (in progress): retrain a MINIMAL-EDIT proposer on the rejected-would-improve
pairs → does CER go negative? That's the loop closing.

The story arc: an ungoverned generator is dangerous → governance makes it safe but not
yet useful → the gate's own rejections are training data → recycling them makes the
generator useful → CER drops → the loop turns. Data scarcity solved by manufacturing.

## The unifying mechanism (the rigor spine, ties to the trajectory thesis)
One 7-dim trajectory state z_t, reused at three levels: DECODE (attention bias in CTC),
GOVERN (AGP partitions: stable/boundary/uncertain/novelty), and now LEARN (which
rejected pairs become SFT data). The falsifiable claim + ablation: frozen decode-time
z_t predicts govern-admissibility and learn-value above chance and above per-layer
recompute. "One geometry, three roles" = what makes it an architecture, not three tools.
(Caveat to keep honest: the z_t↔partition map is partly definitional; the ablation is
what converts it from framing to result. NOT yet run — checkpoint on HD1 now.)

## Where N'Ko (the script) sits — don't lose the measurement thesis
Two orthogonal axes that meet:
- SCRIPT axis (why N'Ko): bijective, phoneme-transparent → CER is a phonemically
interpretable metric (Latin WER is not). This makes the loop AUDITABLE: every
accept/reject is interpretable at the phoneme level. The transparent script is what
lets governance have a trustworthy signal at all.
- ARCHITECTURE axis (the reusable primitive): z_t uncertainty geometry, script-agnostic.
The script axis is WHY the governance signal is trustworthy; the architecture axis is
HOW it's reused. Tone/FAC = the reconstruction refinement on accepted output, the last
faithfulness layer (not load-bearing for the loop; toneless pairs train it fine).

## Cross-lingual substrate extension
Mohamed's later question sharpened the script axis: if N'Ko is phonemic and has a
documented extension mechanism, can it represent sounds beyond Manding without retraining?

The answer is yes for representation, not automatically for hearing. A new companion note
captures this spine:

`NKO-PHONEMIC-SUBSTRATE-PROPOSAL.md`

Validated coverage gate:
- Manding baseline/full: 27/27 = 100
- French baseline 23/36 = 63.9
36/36 = 100
- English baseline 24/41 = 58.5
41/41 = 100

Strategic boundary: no retraining is needed for the text/IPA→N'Ko representation layer.
Direct audio→N'Ko still needs ASR adaptation unless the acoustic model is truly featural
and can compose unseen phonemes from heard features. That makes FAC the bridge from
representational coverage to possible zero-shot acoustic coverage.

## TABLE OF CONTENTS (proposed)
Working title: "Growing a Language Model from Its Own Errors: Governed Self-Correction
for Low-Resource Speech, with N'Ko as Computational Infrastructure."

1. Introduction — the data desert; you can't pretrain a corpus you don't have; thesis
= grow the prior under governance; contributions list.
2. The Data Problem for N'Ko — 40M speakers, ~zero AI data, ~500h public audio, no
large toned text. Why standard recipes fail. Why self-generation is forced.
3. N'Ko as Computational Infrastructure — bijective/transparent script; CER as a
phonemically interpretable metric; the property that makes governance auditable.
(absorbs the measurement thesis)
4. The Trajectory State z_t — 7-dim uncertainty geometry; how it's estimated;
the "one geometry, three roles" claim stated up front.
5. Stage 1 — Decode — Anticipatory Transformer CTC; the 20.57
its honest provenance bounds); toneless target by design.
6. Stage 2 — Govern (AGP) — partitions, admissibility, the Rust gate, row contract,
provenance witnesses. The conservatism-as-safety argument.
7. Stage 3 — Correct & Recycle — the corrector as proposer; the harm result
(+15.94pp ungoverned); governance neutralization (+0.14pp); budget two-regime proof
(-26pp oracle headroom); rejected pairs → SFT; the minimal-edit retrain; loop-v1 CER.
8. The z_t Transfer Ablation — the rigor experiment: does frozen decode z_t predict
govern + learn value? (the falsifiable core)
9. Reconstruction & Tone (FAC) — the faithfulness refinement on accepted output;
corrected 7-mark inventory; tone = text prior × acoustic evidence. (companion-pillar)
10. Self-Improving Loop & Data Flywheel — putting it together; OCR-lesson corpus +
AfVoices; the loop as a data manufacturing process; honest projections, not hype.
11. Limitations — anchor not a fresh strict repro; OCR tone-gold provisional;
read-speech gold still needed; accepted-worse leakage on the Python gate; loop shown
for N turns not asymptote.
12. Conclusion — for low-resource languages, the model IS the loop; governance is
what makes self-training safe; N'Ko's transparency is what makes the signal trustworthy.

## Experiment ledger (what's DONE vs NEEDED)
DONE (this session, on-disk artifacts): harm result, governance neutralization, budget
two-regime, rejection anatomy (89
IN PROGRESS: minimal-edit retrain (mac5) → loop-v1 CER.
NEEDED: z_t transfer ablation (checkpoint on HD1 now), Rust-gate at-scale, the full 29k
real run AFTER the proposer is fixed, read-speech tone gold (separate, for §9).

## Strategic call
Lead with the LOOP (this doc). Keep paper/final/ as the venue-split conference set that
FEEDS this umbrella. The flagship paper_canonical (unfrozen this session, retitled
"N'Ko as Computational Infrastructure") becomes the umbrella — re-spine it from
"measurement consolidation" to "the governed self-correcting loop," with measurement as
§3 and the loop as the spine. Don't collapse to one monolith; don't leave as disconnected
ideas. One thesis, many papers, the loop as the through-line.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-brain-scanner/PAPER-STRATEGY.md

Detected Structure

Method · Evaluation · Figures · Architecture