Grand Diomande Research · Full HTML Reader
N'Ko Intelligence Pipeline — Full Architecture
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │ bam-asr- │ │ AfVoices │ │ Djoko │ │ Parents' │ │ early │ │ 253K audio │ │ YouTube │ │ Voice Memos │ │ 38K clean │ │ Latin text │ │ 5.5K audio │ │ Malinke │ │ N'Ko labels │ │ NO N'Ko │ │ consensus │ │ diarized │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └───────┬───────┘ │ │ │ │ │ ┌────────┴────────┐ │ │ │ │ NEEDS TONE-AWARE│ │ │ │ │ TRANSLITERATION │ │ │ │ │ (blocked until │ │ │ │ │ tone model) │ │ │ │ └────────┬────────┘ │ │ │ │ │ │ ▼ ▼ ▼
Full Public Reader
N'Ko Intelligence Pipeline — Full Architecture
Data Sources → Models → Papers
DATA SOURCES
============
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐
│ bam-asr- │ │ AfVoices │ │ Djoko │ │ Parents' │
│ early │ │ 253K audio │ │ YouTube │ │ Voice Memos │
│ 38K clean │ │ Latin text │ │ 5.5K audio │ │ Malinke │
│ N'Ko labels │ │ NO N'Ko │ │ consensus │ │ diarized │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └───────┬───────┘
│ │ │ │
│ ┌────────┴────────┐ │ │
│ │ NEEDS TONE-AWARE│ │ │
│ │ TRANSLITERATION │ │ │
│ │ (blocked until │ │ │
│ │ tone model) │ │ │
│ └────────┬────────┘ │ │
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────┐
│ CLEAN TRAINING DATA │
│ Phase 1: 44K (bam + Djoko) ← CURRENT │
│ Phase 2: +253K AfVoices (after tone resolution model) │
│ Phase 3: +parents' Malinke (after dialect adaptation) │
└──────────────────────────┬───────────────────────────────────────┘
│
▼
YOUTUBE OCR PIPELINE
====================
┌──────────────────┐
│ babamamadidiane │ 2,001 N'Ko tutorial videos
│ YouTube Channel │ Teacher writes N'Ko on screen
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Frame Extraction │ Key frames where text appears
│ (already done) │ Scene change detection
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Gemma 4 Scene │ Visual description of each frame
│ Analysis │ "Teacher writing ߓߊ on whiteboard"
│ (Task #31) │ Context: what lesson, what word
└────────┬─────────┘
│
▼
┌──────────────────┐
│ N'Ko OCR Model │ Read N'Ko characters from frames
│ (Gemma VLM or │ WITH tone marks (visible on screen)
│ fine-tuned │ This is ground truth tonal N'Ko
│ mlx-vlm) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ N'Ko Text Corpus │ Thousands of correctly-written
│ WITH TONES │ N'Ko sentences from video lessons
│ │ = Language model training data
└────────┬─────────┘
│
├─────────────────────────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Contextual Tone │ │ N'Ko Language │
│ Resolution Model │ │ Model (LM) │
│ │ │ │
│ Input: toneless │ │ Perplexity-based │
│ N'Ko text │ │ quality scoring │
│ Output: toned │ │ for ASR output │
│ N'Ko text │ │ │
└────────┬─────────┘ └──────────────────┘
│
│ Unlocks AfVoices:
│ audio → ASR (toneless) → tone model (toned) → clean label
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ FULL TRAINING DATA │
│ 44K clean + 253K tone-resolved AfVoices + YouTube OCR corpus │
│ = 300K+ high-quality N'Ko speech-text pairs │
└──────────────────────────┬───────────────────────────────────────┘
│
▼
MODELS
======
┌──────────────────────────────────────────────────────────────────┐
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌───────────────┐ │
│ │ Frozen Whisper │───▶│ CTC Head │───▶│ N'Ko Text │ │
│ │ Encoder │ │ + Trajectory │ │ Output │ │
│ │ (ANE offload) │ │ + TAR depth │ │ │ │
│ │ │ │ (GPU training) │ │ │ │
│ └────────────────┘ └────────────────┘ └───────────────┘ │
│ │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ N'Ko OCR │───▶│ Tone Resolution│ │
│ │ (Visual) │ │ (Contextual) │ │
│ └────────────────┘ └────────────────┘ │
│ │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ KARL Trajectory│───▶│ Cognitive Twin │ │
│ │ Intelligence │ │ (Fine-tuned) │ │
│ └────────────────┘ └────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
PAPERS
======
Paper 4: "Script Advantage in CTC-ASR" [WRITTEN]
─────────────────────────────────────────
Finding: N'Ko bijective script gives 5.25pp CER advantage
over Latin with trajectory bias. Graph attention is
script-dependent.
Data: 37K bam-asr-early
Status: Written, figures done, awaiting final numbers
Paper 5: "Generalization & Speaker Adaptation" [WRITTEN]
─────────────────────────────────────────
Finding: N'Ko generalizes better to unseen words (Exp F),
adapts faster per-speaker via TTT (Exp G), Djoko consensus
methodology.
Data: 37K + 5.5K Djoko
Status: Draft complete
Paper 6: "Trajectory Attention Residuals (TAR)" [TRAINING]
─────────────────────────────────────────
Question: Does depth-wise attention routing improve N'Ko
more than Latin? Is the effect script-dependent?
Architecture: TAR = trajectory scalars gate between
standard residual and depth attention over all preceding
layers.
Data: 44K clean (bam + Djoko)
Status: TAR experiment running on Vast.ai A100
Paper 7: "Neural Engine Offload for Consumer Training" [SPIKE DONE]
─────────────────────────────────────────
Finding: M4 ANE runs Whisper-scale projections at 11.6
TFLOPS (73.6% utilization) via reverse-engineered private
APIs with quantized inline weights.
Contribution: Three bridge fixes for macOS 26/M4,
BLOBFILE workaround, first Whisper-scale ANE benchmark.
Status: Spike proven, training loop next
Paper 8: "Contextual Tone Resolution for N'Ko ASR" [PLANNED]
─────────────────────────────────────────
Question: Can a language model trained on OCR-extracted
N'Ko text resolve tonal ambiguity in ASR output?
Pipeline: YouTube OCR → N'Ko corpus → tone LM →
ASR post-processing
Data: babamamadidiane YouTube (2,001 videos) +
Gemma 4 scene analysis + N'Ko OCR
Unlocks: 253K AfVoices with correct tonal labels
Status: Gemma 4 scene analysis queued (Task #31)
Paper 9: "Distributed Training on Apple Neural Engine" [FUTURE]
─────────────────────────────────────────
Question: Can a mesh of Apple devices (Macs + iPhones +
iPads) collectively train models via ANE offload +
Thunder-Train gradient sync?
Architecture: ANE frozen forward, GPU adapter backward,
Thunderbolt gradient sync, CoreML on iOS devices
Status: Depends on Paper 7 training loop proof
Paper 10: "TurboQuant for Low-Resource Retrieval" [FUTURE]
─────────────────────────────────────────
Question: Can 4-bit vector quantization replace ANN
indexes for RAG++ at 332K scale?
Architecture: Hadamard rotation + scalar quantization,
Rust sidecar for Orbit, coarse-to-fine retrieval
Status: Library built, 0.993 cosine at 4-bit,
needs real embedding benchmark
THE SELF-IMPROVING LOOP
=======================
┌─────────────────────────────────────────────────────────┐
│ │
│ Train ASR (44K clean) │
│ │ │
│ ▼ │
│ ASR labels AfVoices audio (253K) │
│ │ │
│ ▼ │
│ OCR extracts toned N'Ko from YouTube (Paper 8) │
│ │ │
│ ▼ │
│ Tone model fixes ASR labels │
│ │ │
│ ▼ │
│ Retrain ASR on 300K+ clean data │
│ │ │
│ ▼ │
│ Better ASR → better labels → better ASR → ... │
│ │
│ Each cycle: ASR improves, corpus grows, │
│ tone model gets more training data │
│ │
└─────────────────────────────────────────────────────────┘Current Status (2026-04-05)
| Component | Status | Next Action |
|---|---|---|
| 44K clean data | READY | Training TAR now |
| 253K AfVoices features | EXTRACTED | Waiting for tone model |
| N'Ko trajectory CER | 27.39 | |
| TAR experiment | TRAINING | 4 runs on A100 |
| ANE spike | 11.6 TFLOPS proven | Wire into training loop |
| YouTube OCR (Gemma 4) | QUEUED | Task #31 on Vast.ai |
| Tone resolution model | PLANNED | Depends on OCR corpus |
| TurboQuant | BUILT | Needs real embedding test |
| Contextual tone paper | PLANNED | Depends on OCR + tone model |
| ANE distributed paper | PLANNED | Depends on training loop proof |
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
nko-brain-scanner/PIPELINE.md
Detected Structure
Method · Evaluation · Figures · Architecture