N'Ko Publication Readiness and Narrative Plan - 2026-05-03
You can still talk about the 20.57% CER result, but it should be framed as an archived checkpoint anchor, not as a completed May 2026 reproduction and not as proof that all controlled ASR comparisons are closed.
Full Public Reader
N'Ko Publication Readiness and Narrative Plan - 2026-05-03
Bottom Line
You can still talk about the 20.57
archived checkpoint anchor, not as a completed May 2026 reproduction and not as proof
that all controlled ASR comparisons are closed.
The strongest public narrative is not "we solved Bambara ASR." The strongest narrative
is:
> Modern AI systems do not just perform badly on N'Ko. They reveal a deeper
> infrastructure failure: the script is underrepresented in model vocabularies,
> activations, benchmarks, and evaluation metrics. N'Ko's bijective design gives us a
> cleaner way to measure and build Manding language technology, and the archived
> 20.57
> justify the research direction.
That narrative lets you use the 20.57
finished.
Current Evidence Status
Confirmed local ASR anchor
Artifact:
`[home]/Desktop/nko-brain-scanner/local_results_cache/paper4_reproduction_35205256/results.json`
Facts:
- Script: N'Ko
- Mode: trajectory
- Test CER: 20.57
- LR: `0.0003`
- Batch size: `32`
- Dropout: `0.1`
- Seed: `42`
- Split: `232476/29060/29060`
- Best validation loss: `0.6358872798606507`
- Epochs trained: `47`
- Checkpoint exists locally:
`[home]/Desktop/nko-brain-scanner/local_results_cache/paper4_reproduction_35205256/best.pt`
Interpretation:
- Safe to call it an archived N'Ko trajectory checkpoint reporting 20.57
- Safe to use it as the strongest retained ASR anchor in the repo.
- Unsafe to call it a completed May 2026 strict reproduction.
Non-comparable low-LR matrix
Artifact root:
`[home]/Desktop/nko-brain-scanner/local_results_cache/paper4_same_snapshot_20260422_safe_lr1e4`
Results:
- N'Ko baseline: 31.38
- N'Ko TAR: 31.69
- N'Ko trajectory+TTT: 31.12
- Latin baseline: 31.66
- Latin trajectory: 32.81
Interpretation:
- These runs show the training stack worked under a conservative low-LR profile.
- They do not validate or refute the 20.57
`lr=0.0003`.
Strict May 2026 audit
Status:
- Correctly configured.
- Did not complete.
- Produced no final CER.
- Vast instance destroyed and billing stopped.
Interpretation:
- Mention only as a transparency note if needed.
- Do not cite as evidence.
Paper-by-Paper Evaluation
Paper 1 - Dead Circuits
File:
`[home]/Desktop/nko-brain-scanner/paper/current/paper1_dead_circuits.tex`
Readiness: strongest public research asset.
Core value:
- Mechanistic story: N'Ko is absent or weak in LLM representations.
- Evidence types: translation tax, activation norm differences, entropy gap,
sparsity, kurtosis deficit, tokenizer comparison, LoRA remediation.
- This is the cleanest "AI infrastructure ignores indigenous scripts" paper.
Public posture:
- Lead with this.
- It is not dependent on the incomplete Vast audit.
- It should be packaged with Paper 3 or extended by Paper 3.
Needed before posting:
- Compile public PDF.
- Verify all figures/tables have source scripts or result JSONs.
- Add an artifact manifest with paths and hashes.
- Remove any venue/cost language that is internal rather than scholarly.
Recommendation:
- Make this the flagship paper.
Paper 2 - Living Speech
File:
`[home]/Desktop/nko-brain-scanner/paper/current/paper2_living_speech.tex`
PDF exists:
`[home]/Desktop/nko-brain-scanner/paper/current/paper2_living_speech.pdf`
Readiness: technical report, not a clean benchmark submission.
Core value:
- Documents the ASR build path: BiLSTM CTC, Transformer CTC, Whisper LoRA, bridge,
FSM, and practical N'Ko transcription constraints.
- Shows engineering progress and lessons.
Main caveats:
- V3 metrics are validation metrics, not independent test metrics.
- V4 reaches 29.4
significant: 20 wins, 19 losses, 11 ties, sign-test `p=0.44`.
- Round-trip WER includes bridge conversion error.
- It should not be framed as production-ready ASR.
Public posture:
- Position as "building a first script-native N'Ko ASR stack."
- Do not sell it as SOTA.
- Use it to support the larger story that script-native evaluation and tooling are
possible.
Recommendation:
- Publish later as a technical report or blog-backed preprint after wording cleanup.
Paper 3 - Script Invisibility Across Architectures
File:
`[home]/Desktop/nko-brain-scanner/paper/current/paper3_cross_model.tex`
Readiness: strong, but should probably merge with Paper 1.
Core value:
- Shows N'Ko invisibility is not just Qwen-specific.
- Cross-model evidence across Qwen3-8B, Qwen2.5-7B, and Mistral-7B supports a
structural diagnosis.
- Strengthens Paper 1's thesis.
Main caveat:
- It should not overstate "all models" or "universal" beyond the tested families.
Public posture:
- Best used as the validation section of a stronger combined paper:
`Dead Circuits: Script Invisibility Across Large Language Model Families`.
Recommendation:
- Combine with Paper 1 unless you specifically want two shorter preprints.
Paper 4 - Script Design Affects ASR
File:
`[home]/Desktop/nko-brain-scanner/paper/current/paper4_script_advantage.tex`
PDF exists:
`[home]/Desktop/nko-brain-scanner/paper/current/paper4_script_advantage.pdf`
Readiness: conceptually important but public-risky until wording is corrected.
Core value:
- The theory is strong: bijective script CER is more phonemically meaningful than
Latin-script WER/CER for Manding.
- The archived 20.57
- It gives the basis for Paper 8.
Main caveats:
- Current draft language around "fully verified" and "fresh reproduction" should be
revised before public posting because the strict May 2026 audit did not complete.
- The low-LR ablations should be explicitly labeled non-comparable in the abstract and
conclusion.
- It cannot claim a completed controlled proof of N'Ko superiority over Latin under
matched hyperparameters.
Public posture:
- Do not publish as a final empirical benchmark paper yet.
- Convert into either:
- a theory/metrics paper, or
- a technical report with an explicit "unresolved audit" section.
Recommendation:
- Freeze as a technical report.
- Extract the clean metric argument into Paper 8.
Paper 5 - Deployment Properties
File:
`[home]/Desktop/nko-brain-scanner/paper/current/paper5_deployment.tex`
PDF exists:
`[home]/Desktop/nko-brain-scanner/paper/current/paper5_deployment.pdf`
Readiness: narrative/roadmap paper, not a finished empirical paper.
Core value:
- Connects the 20.57
speaker-level TTT, generalization, and actionability.
- Makes the case that script choice affects downstream system behavior, not only
within-distribution CER.
Main caveats:
- Several deployment results are historical/provisional.
- TTT and Djoko claims were not rerun as a complete current-snapshot artifact bundle.
- Some wording is too strong for the current evidence, especially if it implies
controlled deployment proof.
Public posture:
- Use as a whitepaper, project narrative, or future-work essay.
- Do not make it the first formal research submission.
Recommendation:
- Keep as the "why this matters" document, not the empirical centerpiece.
The Best Publication Package
First public package
Title:
`Dead Circuits: Script Invisibility Across Large Language Model Families`
Inputs:
- Paper 1
- Paper 3
Message:
- AI systems do not merely lack performance on N'Ko; they lack internal circuit
support for it.
- This is measurable through activation profiling.
- The pattern appears across multiple model families.
- Targeted script-aware training can repair some of the gap.
Why this should go first:
- It is the strongest evidence.
- It does not depend on the unfinished ASR audit.
- It creates the broader intellectual frame for all later ASR papers.
Second public package
Title:
`Against WER: Why Character Error Rate on Bijective Scripts Is the Correct Metric for Manding ASR Evaluation`
Inputs:
- Paper 4 theory section
- Paper 5 metric section
- Existing ASR evidence as motivation, not as the main proof
Message:
- Latin WER is a weak metric for Manding ASR because Latin Bambara spelling is
inconsistent, tone is underrepresented, and digraphs obscure phonemic boundaries.
- N'Ko CER is not perfect, but it is much closer to phonemic accuracy because N'Ko is
bijective.
- The 20.57
CER becomes interpretable in a way Latin WER is not.
Why this should go second:
- It does not need new compute.
- It lets you talk about the 20 CER while being scientifically careful.
- It turns the unresolved benchmark into a broader metric argument.
Third public package
Title:
`Script-Native ASR for N'Ko: Systems, Metrics, and Deployment Lessons`
Inputs:
- Paper 2
- Paper 4
- Paper 5
Message:
- This is a technical report of the ASR program: what worked, what failed, what remains
unresolved.
- It preserves the 20.57
Why this should go third:
- It is valuable, but it has the most caveats.
- It should not lead the public rollout.
How To Talk About The 20.57
Safe wording for abstracts
> We preserve an archived N'Ko trajectory ASR checkpoint trained on a 290,596-pair
> Bambara corpus snapshot, reporting 20.57
> settings (`lr=0.0003`, batch size 32, seed 42). A later strict audit was launched
> but did not complete, so we treat this result as an anchor for future validation
> rather than as a newly reproduced benchmark.
Safe wording for a project page
> The strongest retained ASR artifact is an archived N'Ko trajectory checkpoint at
> 20.57
> finish before compute funding ended, so the result is presented as an anchor, not a
> final leaderboard claim.
Safe wording for social posts
> We reached an archived 20.57
> for Manding languages, script-native evaluation may be more important than chasing
> Latin WER. N'Ko gives us a phonemically grounded metric that Latin orthography cannot.
Unsafe wording to avoid
- "We reproduced 20.57
- "N'Ko definitively beats Latin under matched conditions."
- "TAR/TTT improves the 20.57
- "This is production-ready Bambara ASR."
- "N'Ko CER and Latin WER are directly comparable."
Upload and Distribution Plan
GitHub
Purpose:
- Canonical project repository.
- Source code, papers, figures, result manifests, and closeout docs.
Before posting:
- Add a top-level `RESEARCH.md` or update `README.md` with:
- paper list
- safe evidence ledger
- 20.57
- caveats
- links to PDFs and artifacts
- Add a `docs/artifact-manifest-2026-05-03.md` with hashes for:
- `results.json`
- `best.pt`
- `vocab.json`
- `split.json`
- relevant paper PDFs
- Compile Paper 1 and Paper 3 PDFs, since only Papers 2, 4, and 5 currently have PDFs
in `paper/current`.
arXiv or preprint server
Best first upload:
- Combined Paper 1 + Paper 3.
Best second upload:
- Paper 8, once written from the metric argument.
Do not upload first:
- Paper 4 as-is.
- Paper 5 as-is.
Reason:
- The ASR papers require careful caveat editing before public posting.
Zenodo
Purpose:
- DOI-backed archive of the repo state and artifact manifest.
Recommended:
- Create a release after cleaning README and PDFs.
- Archive the release on Zenodo.
- Include the closeout documents so the evidence boundaries are preserved.
Hugging Face
Purpose:
- Host dataset/model cards if licensing and size permit.
Potential uploads:
- Model card for the 20.57
- Dataset card for the feature/pair artifact if redistribution is allowed.
Critical wording:
- "Archived checkpoint anchor."
- "Strict May 2026 reproduction incomplete."
- "Not a final leaderboard claim."
OSF
Purpose:
- Research-project landing page for papers, protocols, caveats, and artifacts.
Useful because:
- OSF is good for mixed evidence packages, preregistration-style notes, and negative or
incomplete audit documentation.
Blog / project site
Purpose:
- Narrative explanation for non-reviewer audiences.
Best story sequence:
1. The script machines cannot read.
2. N'Ko exposes the blind spot.
3. Why Latin WER is the wrong target for Manding ASR.
4. What the 20.57
5. What remains to be validated.
Papers with Code
Use only after:
- A preprint exists.
- Code and artifacts are organized.
- Claims are stable.
Do not use it to imply a leaderboard win from the 20.57
Public Narrative Arc
Short version
N'Ko is a stress test for AI language infrastructure. It is a living, engineered,
phonemically precise script used by Manding-language communities, but modern AI systems
treat it like noise because their data, tokenizers, and benchmarks largely exclude it.
Our work measures that exclusion inside LLM activations, builds script-native ASR
systems, and argues that N'Ko CER is a more scientifically meaningful metric for
Manding speech than Latin WER.
Medium version
Most AI language work treats writing systems as interchangeable surfaces. N'Ko shows
that this assumption fails. In LLMs, N'Ko is not merely low-resource; it is structurally
underrepresented in learned representations. In ASR, N'Ko's one-to-one
phoneme-to-character mapping makes character error rate interpretable as a proxy for
phonemic accuracy, while Latin Bambara WER mixes speech recognition errors with
orthographic convention errors.
The archived 20.57
N'Ko ASR can reach a meaningful technical threshold on a large Bambara corpus. The
result is not the end of the story: a later strict audit did not finish, and controlled
Latin-vs-N'Ko benchmarking remains unresolved. But the result is strong enough to
anchor a research agenda: build and evaluate African language technology in the
scripts that encode the language structure most faithfully.
One-sentence thesis
N'Ko reveals that AI's failure on many indigenous languages is not only a data problem;
it is a representation, script, and metric problem.
What To Fix Before Public Release
Must fix
- Paper 4 abstract and conclusion must stop implying that the strict audit completed.
- Paper 4 must label the April 22 matrix as `lr=0.0001` and non-comparable to the
20.57
- Paper 5 must keep all deployment results clearly marked as historical/provisional.
- README needs a clear "evidence levels" section.
- Artifact manifest needs hashes and paths.
Should fix
- Compile Paper 1 and Paper 3 PDFs.
- Merge Paper 1 and Paper 3 into one flagship paper.
- Write Paper 8 from the WER/CER argument.
- Add a short "How to cite the 20.57
Do not do now
- Do not restart Vast.
- Do not run another partial audit.
- Do not publish ASR benchmark claims as if the controlled comparison is closed.
Exact Standing Today
- Research program: closed for paid compute as of 2026-05-03.
- Vast: no running instances.
- Strongest scientific contribution: N'Ko script invisibility in LLMs.
- Strongest ASR artifact: archived N'Ko trajectory checkpoint at 20.57
- Weakest open point: strict reproduction and matched Latin-vs-N'Ko ASR comparison did
not complete.
- Best public path: flagship script-invisibility preprint, then metric-position paper,
then ASR technical report.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-brain-scanner/docs/handoffs/nko-publication-readiness-and-narrative-2026-05-03.md
Detected Structure
Method · Evaluation · Figures · Code Anchors · Architecture