Back to corpus
proposalexperiment writeup candidatescore 22

The N'Ko Paper — Strategy, Angle, and Outline

> Written 2026-06-01, after the session that ran the AGP oracle/real benchmark, the > budget two-regime proof, and launched the minimal-edit retrain. This is the paper > the evidence actually supports — not the four-paper measurement split, and not a > tone paper. It is the one that closes the loop.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

> Written 2026-06-01, after the session that ran the AGP oracle/real benchmark, the > budget two-regime proof, and launched the minimal-edit retrain. This is the paper > the evidence actually supports — not the four-paper measurement split, and not a > tone paper. It is the one that closes the loop. ## The angle (Mohamed's own framing) "It went beyond ASR because we're limited with data and we need to figure out how we can make more." That is the thesis. The paper is NOT "we built an N'Ko ASR." It is: **A self-improving speech system that manufactures its own training data, governed by a single trajectory-uncertainty geometry, for a language that has no corpus to train on.** ASR is the entry point. The contribution is the LOOP: decode → govern → correct → recycle-as-data → retrain. The whole point is data generation under governance. ## Why this is the right paper (3 reasons) 1. **It subsumes everything we built.** ASR anchor, z_t trajectory state, AGP gate, FAC/tone, SFT recycler — they stop being 5 contributions and become 5 STAGES of one machine. The four-paper measurement split presents N'Ko as a measuring stick; this presents it as the substrate of a self-improving model. Bigger, truer, one story. 2. **The low-resource constraint is the engine, not a caveat.** High-resource: train a big LM on a giant corpus, done. N'Ko CAN'T (the data desert is real — ~500h public audio, no large toned text). So the system is FORCED to grow its own prior from its own ASR output + a small reference set + acceptance rules. The constraint is what makes the architecture necessary and novel. 3. **We have the experiments, and they're honest.** Most ran THIS session on artifacts already on disk. The killer result is a true one (below).

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.