N'Ko as Computational Infrastructure — Program Map
**N'Ko is not a decorative or interchangeable rendering of Manding. For machine-learning systems it is computational infrastructure.** A designed, bijective, tone-marking script changes four things that are usually treated as fixed:
Full Public Reader
N'Ko as Computational Infrastructure — Program Map
> Front door for the whole research line. One thesis, several pillars, one release plan.
> Last updated: 2026-05-28.
The thesis
N'Ko is not a decorative or interchangeable rendering of Manding. For
machine-learning systems it is computational infrastructure. A designed,
bijective, tone-marking script changes four things that are usually treated as
fixed:
1. What models can represent — tokenizers and hidden circuits behave
differently on N'Ko than on Latin Bambara.
2. How acoustic evidence aligns to symbols — a phoneme-transparent target
changes CTC decoding.
3. Whether a reported error rate is meaningful — N'Ko CER is phonemically
interpretable in a way Latin WER is not.
4. How sound itself can be encoded — the same script that makes recognition
interpretable makes a featural acoustic code possible.
Everything below is a pillar of that one claim.
---
Architecture (the unifying figure)
The trajectory geometry `z_t` is reused at three levels — decode, govern, correct.
This is the same figure that anchors the FAC tone-resolution paper.
N'KO AS COMPUTATIONAL INFRASTRUCTURE
one featural codebook · one trajectory geometry · many signals
signal (mic / mocap) ─► front-end features ─► N'Ko featural codebook
(3,024 tonal syllables)
┌───────────────────────────────────────────────────────────────┐
│ 1. DECODE Anticipatory Transformer + CTC │
│ features ─► z_t (7-dim trajectory) ─► TONELESS N'Ko (~20% CER)│
│ z_t = commitment·uncertainty·transition·novelty·… │
└───────────────────────────────┬───────────────────────────────┘
│ toneless inscription
┌────────────────────────────────▼──────────────────────────────┐
│ 2. GOVERN (AGP) same z_t as a correction policy │
│ partition spans → stable / boundary / uncertain / novelty │
│ trust gate: train / review / exclude │
└───────────────────────────────┬───────────────────────────────┘
│ eligible spans
┌────────────────────────────────▼──────────────────────────────┐
│ 3. CORRECT conservative, AGP-gated │
│ tone = text prior (Paper 8) × acoustic register (FAC) │
│ ↑ "what tone fits" ↑ F0 read · 99% non-contour │
│ ─► TONED N'Ko │
└───────────────────────────────────────────────────────────────┘
SELF-IMPROVING LOOP
OCR lessons ─► VLM ─► toned corpus ─► AGP partition ─► stable pairs
▲ │
└────────────── retrain ◄── lower CER ◄───────────────┘
Two arrows on the codebook: ANALYSIS (signal→N'Ko, the stack above) and
SYNTHESIS (N'Ko→signal). FAC's decoder proves the synthesis arrow for sound;
it is the template for N'Ko-as-score → body motion (computational choreography),
which is a distinct generator, not a front-end swap.The pillars
The program is three linguistic pillars plus an orthogonal systems track.
Read the linguistic pillars as the science; read the systems track as "how we
trained it cheaply." Do not mix them in the narrative.
### Pillar 1 — Representation: can a model see N'Ko?
Whether N'Ko is visible inside model internals (tokenization, activation geometry,
circuit duplication).
- Dead Circuits — N'Ko invisibility inside LLM activations. `[consolidated]`
- Script Invisibility Is Structural — the pattern holds across model families.
`[consolidated]`
- Artifact line: brain-scan activation profiling, custom 512-merge N'Ko BPE
(2.75x compression), vocabulary-extension surgery.
### Pillar 2 — Recognition + Measurement: can a model hear N'Ko, and is the metric real?
The core of the program. Script-native audio-to-N'Ko ASR, plus the argument that
its error metric is scientifically meaningful.
- Living Speech — construction of the script-native N'Ko ASR stack.
`[consolidated]`
- Does Script Design Matter? / Paper 4 — Script Advantage in CTC-ASR —
trajectory-conditioned CTC; N'Ko's bijective script gives a ~5.25pp CER
advantage over Latin. `[written]`
- Paper 5 — Generalization & Speaker Adaptation — unseen-word generalization,
per-speaker TTT, Djoko consensus. `[written]`
- Paper 6 — Trajectory Attention Residuals (TAR) — is depth-wise attention
routing script-dependent? `[training]`
- Transparent-Script Proposition — if the normalized script map is bijective
over the phoneme inventory, character edit distance preserves phoneme-edit
structure (up to normalization). Latin digraphs + optional tone + spelling
variation do not. This makes N'Ko CER a phonemically interpretable metric.
`[the measurement thesis — the strongest single idea]`
### Pillar 3 — Reconstruction + Tone: can a model render N'Ko and resolve its tone?
The generative dual of Pillar 2, and the place the program's biggest open hedge
(tone) gets closed.
- Paper 8 — Contextual Tone Resolution — a language model trained on
OCR-extracted toned N'Ko resolves tone from text context. `[planned]`
- FAC — Featural Acoustic Coding (`Desktop/nko-acoustic-coding/`) — resolves
tone from acoustic F0, and treats the N'Ko syllable codebook as a
sound-encoding substrate (sound → readable, editable, tone-native N'Ko →
sound). `[proposed; pitch-channel pilot done]`
- These two are halves of one problem. Paper 8 is the prior (what tone makes
linguistic sense); FAC is the evidence (what pitch the speaker produced). Tone
resolution = prior + evidence.
### Systems track (orthogonal enablers — not the linguistic thesis)
- Paper 7 — Neural Engine Offload — M4 ANE runs Whisper-scale projections at
11.6 TFLOPS via reverse-engineered private APIs. `[spike done]`
- Paper 9 — Distributed Training on Apple Neural Engine — mesh training across
Macs/iPhones/iPads. `[future]`
- Paper 10 — TurboQuant for Low-Resource Retrieval — 4-bit VQ replacing ANN
indexes for RAG++. `[future]`
---
The canonical ASR anchor (state it with its provenance)
The strongest retained ASR artifact is an archived checkpoint on a
Bambara corpus snapshot (232,476 / 29,060 / 29,060 train/val/test; lr 3e-4,
batch 32, dropout 0.1, seed 42) reporting **test CER ≈ 20
public anchor for "the 20
Honesty bounds (carry these everywhere, as the canonical paper does):
- It is not a fresh strict reproduction.
- It is not a result for later internal decoder variants (other checkpoints
exist at different CER on different corpora; e.g. a 297K trajectory CTC at
~27
- It is not a closed proof that N'Ko beats Latin under every matched setting.
- N'Ko CER still depends on normalization, reference quality, and tone/diacritic
policy — which is exactly the hole Pillar 3 fills.
The value is the measurement thesis around the anchor, not a leaderboard number.
---
The ASR architecture, and how FAC attaches
Anticipatory Transformer CTC (the central ASR object):
frozen Whisper large-v3 features `h_{1:T}` → project to 768d + temporal downsample
→ `u_{1:T'}` → 6-layer Transformer CTC head → N'Ko characters (toneless, 65
classes) → FSM syllable validation. A trajectory module estimates a 7-dim
state `z_t ∈ [0,1]^7` (commitment, uncertainty, transition pressure, recovery
margin, phase stiffness, novelty, stability) injected as an attention-logit
bias `B_ij(z_i, z_j)` before CTC emission.
Three ways FAC can attach. Choose deliberately.
- ✗ Do not swap the CTC target for the full FAC featural codebook (~3,640 tonal
syllables + diacritics). It is data-starved and would raise CER — the same
"richer target needs more data" tension found with morpheme-constrained BPE.
- ✓ Build first — parallel acoustic head (low risk). Same shared encoder, two
heads. Head A = the existing toneless CTC (the 20
Head B = an acoustic tone/register predictor trained on F0. Multitask loss
`L = L_CTC + λ · L_FAC`. Head B's tone posteriors feed the Paper-8 fusion
(text-LM prior + acoustic evidence → toned output). This attacks the tone hedge
without risking the anchor.
- 🚀 Write next — trajectory-state fusion (the new-architecture paper). Augment
`z_t` with FAC's interpretable acoustic coordinates (F0 register, pitch
contour, spectral centroid, onset transient, harmonicity). The attention bias
`B_ij` then conditions on named acoustic geometry rather than only learned latent
dynamics. This grounds the trajectory in interpretable features — the same spirit
as the transparent-script thesis — and literally fuses FAC into the anticipatory
transformer. Uniquely ours because we own both halves.
---
Why FAC matters to CER (the honest causal path)
FAC does not lower CER by hacking the decoder. It lowers CER through the
self-improving loop, whose bottleneck is tone:
train ASR → label 253K AfVoices audio → OCR toned N'Ko (Paper 8)
→ tone model fixes labels ← FAC adds ACOUSTIC tone evidence here
→ retrain on 300K+ clean → lower CER → repeatBetter tone resolution (text prior + acoustic evidence) → cleaner labels → better
retrain → lower CER. That is the flywheel, and FAC is a new input to its weakest
step.
Caveat from the real-speech pilot (parents-audio, Manding): conversational tone is
register-dominated with only ~40-cent within-syllable excursions, so expect gains
in tone-diacritic accuracy, not a dramatic CER drop. The decisive test needs
tone-bearing material (lesson-video / sung tone), not conversation.
---
Release strategy (nothing is released yet)
1. Flagship, release now: the canonical consolidation — *N'Ko as Computational
Infrastructure* (`paper/current/paper_canonical_nko_agp_20cer.tex`). Mature,
carefully hedged, self-contained around the 20
Do not gate it on FAC. This defines the program on arXiv.
2. Companion focused papers, by venue:
- Recognition (Papers 4/5/6) → Interspeech / ICASSP.
- Representation (Dead Circuits / Script Invisibility) → ACL / EMNLP.
- Tone + Reconstruction (Paper 8 fused with FAC) → the reconstruction/tone
paper, after the tone-seam experiment.
- Systems (Paper 7) → an efficiency / MLSys venue.
3. FAC is not a standalone flagship. Empirically thin today (small real-speech
pitch effect, ties a contour-augmented lexical baseline, no synthesizer, H1
unrun). The "FAC beats LAC" framing is a section or workshop note, not a
headline. FAC ships as the Reconstruction pillar + acoustic half of tone once
the tone channel earns its place.
Labels. Program: N'Ko as Computational Infrastructure. Pillars by function:
Representation · Recognition · Measurement · Governance · Reconstruction.
Systems track named separately.
---
Status snapshot
| Pillar | Papers | Status |
|---|---|---|
| Representation | Dead Circuits, Script Invisibility | consolidated; releasable |
| Recognition | 4 (written), 5 (written), 6 (training) | 20 |
| Measurement | transparent-script proposition | written; the spine |
| Governance | AGP / Deployment | consolidated |
| Reconstruction | Paper 8 (planned), FAC (proposed, pilot done) | next build |
| Systems | 7 (spike), 9/10 (future) | orthogonal |
## Next moves
1. Tone-seam experiment — acoustic F0 → tone cues fused with the Paper-8 text
LM, measured on lesson-video audio with OCR'd toned N'Ko as ground truth.
Metric: Tone-Diacritic Error Rate (TDER) + toned CER. Cheap v0: pyin F0 →
register-relative tone classifier, no training. (`nko-acoustic-coding/experiments/`)
2. Release the canonical consolidation as the flagship arXiv paper.
3. Parallel FAC acoustic head on the shared encoder (Build-first option above).
4. Trajectory-state fusion writeup (the new-architecture paper).
See also: `Desktop/nko-acoustic-coding/` (FAC paper + pitch-channel experiments),
`PIPELINE.md` (full paper roadmap + self-improving loop), `paper/current/`
(canonical consolidation).
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-brain-scanner/PROGRAM.md
Detected Structure
Method · Evaluation · Figures · Code Anchors · Architecture