N'Ko Research Program Closeout - 2026-05-03
Stop paid compute now. The Vast instance used for the strict Paper 4 anchor audit was destroyed on 2026-05-03, `vastai show instances` was empty afterward, and the `monitor-nko-anchor-audit` automation was deleted. No further cloud training should be started unless Mohamed explicitly reopens the project later with a full-run budget and artifact-download plan.
Full Public Reader
N'Ko Research Program Closeout - 2026-05-03
Decision
Stop paid compute now. The Vast instance used for the strict Paper 4 anchor audit was
destroyed on 2026-05-03, `vastai show instances` was empty afterward, and the
`monitor-nko-anchor-audit` automation was deleted. No further cloud training should be
started unless Mohamed explicitly reopens the project later with a full-run budget and
artifact-download plan.
This closeout evaluates the paper set as a research program, not as a launch plan for
more experiments.
Executive Conclusion
The project produced a real body of research, but the evidence is uneven.
The strongest, most defensible contribution is the script-invisibility line:
activation profiling shows that N'Ko is effectively absent from current LLM internal
circuits, and cross-model evidence supports the claim that this is structural rather
than a single-model accident.
The ASR/script-advantage line is promising but not fully closed. The archived 20.57
N'Ko trajectory checkpoint is locally preserved and remains the best direct ASR anchor,
but the strict May 2026 audit did not complete before billing ended. The April 22
same-snapshot matrix cannot validate or refute the 20.57
`lr=0.0001`, while the anchor used `lr=0.0003`.
The honest final position is:
- Papers 1 and 3 are the strongest research products and can move forward without more
paid compute after packaging, citation cleanup, and artifact indexing.
- Paper 8, the WER/CER position paper proposal, is the best no-cost next writing
output because it uses theory and existing evidence rather than new training.
- Papers 2, 4, and 5 should be frozen as technical reports or heavily caveated
preprints.
- Papers 6, 7, and 9 should remain future work, not active commitments.
Paper Inventory
Paper 1 - Dead Circuits
File: `[home]/Desktop/nko-brain-scanner/paper/current/paper1_dead_circuits.tex`
Title: `Dead Circuits: Activation Profiling and Script Invisibility in Large Language Models`
Status: strongest finished paper.
What it contributes:
- First activation-profile framing for N'Ko script invisibility.
- Qwen3-8B evidence of translation tax, entropy gap, sparsity, kurtosis deficit, and
missing N'Ko-favorable circuit configurations.
- Mechanistic explanation: data and tokenizer starvation, not right-to-left direction
alone.
- LoRA remediation result showing that circuits can be revived with targeted data.
How to use it:
- Treat as the flagship paper.
- Submit or post after final reproducibility packaging, code/data links, and table
audit.
- Avoid overclaiming universality inside Paper 1 alone; use Paper 3 for the broader
cross-model claim.
Final recommendation: keep and prioritize.
Paper 2 - Living Speech
File: `[home]/Desktop/nko-brain-scanner/paper/current/paper2_living_speech.tex`
Title: `Living Speech: Script-Native Automatic Speech Recognition for N'Ko`
Status: useful systems narrative, but not a clean SOTA paper.
What it contributes:
- Documents the progression from BiLSTM CTC to Transformer CTC to Whisper LoRA.
- Gives a real account of building audio-to-N'Ko ASR.
- Includes bridge/FSM work and practical lessons on N'Ko output constraints.
- Correctly includes important caveats: V3 metrics are validation metrics, V4 WER
improvement is not statistically significant, and WER after N'Ko-to-Latin conversion
includes bridge error.
Main risks:
- 29.4
- The V4 per-sample WER result is 20 wins, 19 losses, 11 ties, with sign-test
`p=0.44`; that cannot be framed as a decisive performance win.
- Comparisons to Latin ASR systems use different data, metrics, and scripts.
How to use it:
- Freeze as a systems/lessons-learned technical report.
- If submitted, make the contribution "building and evaluating a script-native N'Ko ASR
stack", not "state of the art".
Final recommendation: keep, but tone down.
Paper 3 - Script Invisibility Across Architectures
File: `[home]/Desktop/nko-brain-scanner/paper/current/paper3_cross_model.tex`
Title: `Script Invisibility Is Structural: Activation Profiling Across Three LLM Families`
Status: strong companion to Paper 1.
What it contributes:
- Cross-model evidence across Qwen3-8B, Qwen2.5-7B, and Mistral-7B.
- Shows translation tax, weaker N'Ko activations, sparsity, kurtosis deficit, and
entropy inflation across model families.
- Strengthens the argument that script invisibility is structural and data-driven.
How to use it:
- Either submit as a standalone short paper or merge into Paper 1 as an expanded
cross-model version.
- The cleanest path is probably one stronger combined paper: Paper 1 methodology plus
Paper 3 cross-model validation.
Final recommendation: keep and prioritize, likely merged with Paper 1.
Paper 4 - Script Design Affects ASR
File: `[home]/Desktop/nko-brain-scanner/paper/current/paper4_script_advantage.tex`
Title: `Does Script Design Matter? Phonetic Transparency and CTC Decoding for N'Ko Automatic Speech Recognition`
Status: important but unresolved.
What it contributes:
- Formal argument that bijective scripts make CER more phonemically meaningful than
many-to-many Latin orthographies.
- Preserves the archived N'Ko trajectory anchor: 20.57
snapshot with train/val/test split `232476/29060/29060`, `lr=0.0003`, batch size
`32`, dropout `0.1`, seed `42`, best validation loss `0.6358872798606507`, and 47
trained epochs.
- Documents the later same-snapshot low-LR ablations:
- N'Ko baseline: 31.38
- N'Ko TAR: 31.69
- N'Ko trajectory+TTT: 31.12
- Latin baseline: 31.66
- Latin trajectory: 32.81
Main risks:
- The strict May 2026 audit did not finish, so it produced no final CER.
- The April 22 ablations used `lr=0.0001`; the anchor used `lr=0.0003`.
- The original five-run matrix omitted the plain `nko_trajectory_290596` run needed for
a direct anchor comparison.
- A contaminated vocab path and mixed-shape feature tensors were discovered and fixed
late, but the strict corrected audit was incomplete.
How to use it:
- Keep the 20.57
- Do not claim that the strict audit reproduced, contradicted, or improved the anchor.
- Do not claim a fully controlled N'Ko-vs-Latin script advantage under matched
hyperparameters.
- Use Paper 4 as a technical report and theory paper unless future funding completes
the strict audit.
Final recommendation: freeze with caveats.
Paper 5 - Deployment Properties
File: `[home]/Desktop/nko-brain-scanner/paper/current/paper5_deployment.tex`
Title: `Beyond Controlled Comparison: Deployment Properties of Script-Aware ASR for N'Ko`
Status: deployment vision, not a finished empirical paper.
What it contributes:
- Assembles deployment-relevant claims around out-of-domain behavior, Djoko-style
deployment, TTT, compositional generalization, and actionability.
- Anchors the narrative to the verified 20.57
- Correctly labels several deployment figures as historical/provisional.
Main risks:
- Deployment claims are not rerun as an artifact-complete current-snapshot bundle.
- TTT/domain-transfer evidence is historical and cannot carry a final empirical paper
without rerun artifacts.
- The strongest value here is framing, not final measurement.
How to use it:
- Keep as a whitepaper, roadmap, or future-work appendix.
- Do not submit as a primary empirical paper in its current state.
Final recommendation: freeze as vision/future work.
Proposed Papers 6-9
Source: `[home]/Desktop/nko-brain-scanner/paper/paper_proposals_april2026.md`
Paper 6 - Trajectory Scalars / TAR Negative Result
Status: do not proceed now.
Reason:
- It depends on a reproduction result that did not complete.
- The scientifically useful framing is good: trajectory scalars may matter while TAR
adds little. But without completed strict artifacts, it is not ready.
Final recommendation: retire for now.
Paper 7 - Embodied Vocabulary
Status: future creative research, not part of this closeout package.
Reason:
- It requires human validation and vocabulary-engine evidence.
- It is interesting, but it is downstream of the current ASR evidence and should not be
used to conclude this phase.
Final recommendation: archive as future work.
Paper 8 - Against WER
Status: best no-cost next paper.
Reason:
- It can be argued from script theory, metric validity, N'Ko bijectivity, and the
existing evidence base.
- It does not require another Vast run.
- It converts the strongest safe insight from Paper 4 into a clean position:
Latin WER is a weak metric for tonal Manding ASR, while N'Ko CER is phonemically
interpretable under a bijective transcription function.
Required adjustment:
- Replace older supporting numbers such as 27.50
20.57
contextual, not decisive, evidence.
Final recommendation: write this if one more no-cost paper is desired.
Paper 9 - Machine and Human Compositional Generalization
Status: future work.
Reason:
- Requires human learning data and likely a control group.
- Good idea, but outside the stop-spend conclusion.
Final recommendation: archive as future work.
Evidence Ledger
Direct local anchor
Path:
`[home]/Desktop/nko-brain-scanner/local_results_cache/paper4_reproduction_35205256/results.json`
Result:
- Script: N'Ko
- Mode: trajectory
- Test CER: 20.57
- LR: `0.0003`
- Batch size: `32`
- Dropout: `0.1`
- Seed: `42`
- Split: `232476/29060/29060`
- Best validation loss: `0.6358872798606507`
- Epochs trained: `47`
Interpretation:
- This is the strongest ASR anchor retained locally.
- It supports saying "the repository contains an archived N'Ko trajectory checkpoint
reporting 20.57
Non-comparable April 22 low-LR matrix
Path root:
`[home]/Desktop/nko-brain-scanner/local_results_cache/paper4_same_snapshot_20260422_safe_lr1e4`
Results:
- `nko_baseline_290596`: 31.38
- `nko_tar_290596`: 31.69
- `nko_trajectory_ttt_290596`: 31.12
- `latin_baseline_290596`: 31.66
- `latin_trajectory_290596`: 32.81
Interpretation:
- Useful as exploratory same-snapshot evidence.
- Not a validation or refutation of the 20.57
Incomplete strict May 2026 audit
Closeout:
`[home]/Desktop/nko-brain-scanner/docs/handoffs/paper4-research-closeout-2026-05-03.md`
Artifacts:
- `[home]/Desktop/nko-brain-scanner/docs/handoffs/train_vastai_tar_ttt_anchor_audit_20260502.py`
- `[home]/Desktop/nko-brain-scanner/docs/handoffs/vast-anchor-audit-strict-nko-2026-05-02.sh`
Interpretation:
- Corrected the important gotchas: plain trajectory run, `lr=0.0003`, `patience=8`,
strict N'Ko vocab hash, split/hash checks, feature-shape normalization.
- Did not complete before Vast credit ran out.
- Produces no final CER and should not be cited as a result.
Claims Safe To Keep
- N'Ko script invisibility is visible in activation profiles and is not merely a
right-to-left rendering issue.
- Data/tokenizer starvation is the most plausible mechanism behind missing N'Ko
circuits in the tested LLMs.
- Targeted fine-tuning can partially revive N'Ko handling.
- The repository preserves an archived N'Ko trajectory ASR result reporting 20.57
CER under the recorded parameters.
- The April 22 low-LR ASR matrix behaves differently from the 20.57 anchor and should
be treated as non-comparable evidence.
- CER on a bijective script is more phonemically interpretable than WER on inconsistent
Latin Bambara orthography.
Claims To Downgrade Or Remove
- Do not claim the May 2026 strict audit validated the 20.57
- Do not claim TAR or TTT improves the anchor.
- Do not claim a completed controlled proof that N'Ko beats Latin under matched
hyperparameters.
- Do not claim deployment readiness.
- Do not compare N'Ko CER and Latin WER as if they are the same metric.
- Do not use old 27.50
current-snapshot evidence unless the exact artifact chain is separately recovered
and verified.
Final Publication Path Without More Money
Package A - Strongest research submission
Combine Paper 1 and Paper 3 into one stronger paper:
`Dead Circuits: Script Invisibility Across Large Language Model Families`
Core thesis:
- N'Ko is not just underperforming; it is structurally absent from learned
representations.
- Cross-model activation profiling supports the diagnosis.
- Targeted data and tokenizer-aware training can revive the script without destroying
English performance.
This is the most defensible publishable outcome from the project.
Package B - Short position paper
Write Paper 8:
`Against WER: Why Character Error Rate on Bijective Scripts Is the Correct Metric for Manding ASR Evaluation`
Core thesis:
- Latin Bambara WER measures agreement with unstable spelling conventions.
- N'Ko CER is phonemically interpretable because the script is bijective.
- The ASR experiments motivate the metric argument, but the paper does not depend on
another training run.
This is the best no-cost follow-on.
Package C - Technical report bundle
Freeze Papers 2, 4, and 5 as:
`Script-Native ASR for N'Ko: Systems, Metrics, and Deployment Lessons`
Core thesis:
- We built multiple N'Ko ASR systems and learned where the evidence is strong and where
it is not.
- The 20.57
- The strict reproduction is incomplete.
- Deployment claims remain provisional.
This should not be framed as a final benchmark paper.
If The Project Resumes Later
Only resume paid compute if all of these are true:
- The account has enough prepaid credit for the full run, not partial progress.
- The host has enough disk for the complete Hugging Face feature cache.
- The launch script verifies pair hash, vocab hash, split sizes, feature count, and
trainer hash before training.
- The run includes plain `nko_trajectory_290596` at `lr=0.0003`, `patience=8`.
- Artifacts are synced off-box during or immediately after training.
- The instance is destroyed only after results and checkpoint files are verified
locally.
Minimum future experiment:
- Plain N'Ko trajectory, strict anchor settings.
- Optional N'Ko trajectory+TTT only after the plain anchor completes.
Do not rerun the whole matrix first.
Final Research Stance
Conclude this phase as a successful exploratory research program with one strong
publishable thesis and one unresolved ASR benchmark thread.
The strongest sentence to stand behind:
`Modern language models do not merely perform poorly on N'Ko; activation evidence shows
that the script is structurally underrepresented, and this underrepresentation is
recoverable through targeted data and tokenizer-aware adaptation.`
The ASR sentence to use carefully:
`The repository preserves an archived N'Ko trajectory ASR checkpoint reporting 20.57
test CER, but the strict May 2026 reproduction did not finish; therefore the result is
an anchor for future validation, not a newly reproduced benchmark.`
That is the clean conclusion for today.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-brain-scanner/docs/handoffs/nko-research-program-closeout-2026-05-03.md
Detected Structure
Method · Evaluation · Figures · Code Anchors · Architecture