Grand Diomande Research · Full HTML Reader

N'Ko Intelligence Pipeline — Full Architecture

┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │ bam-asr- │ │ AfVoices │ │ Djoko │ │ Parents' │ │ early │ │ 253K audio │ │ YouTube │ │ Voice Memos │ │ 38K clean │ │ Latin text │ │ 5.5K audio │ │ Malinke │ │ N'Ko labels │ │ NO N'Ko │ │ consensus │ │ diarized │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └───────┬───────┘ │ │ │ │ │ ┌────────┴────────┐ │ │ │ │ NEEDS TONE-AWARE│ │ │ │ │ TRANSLITERATION │ │ │ │ │ (blocked until │ │ │ │ │ tone model) │ │ │ │ └────────┬────────┘ │ │ │ │ │ │ ▼ ▼ ▼

Language as Infrastructure architecture technical paper candidate score 44 .md

Full Public Reader

N'Ko Intelligence Pipeline — Full Architecture

Data Sources → Models → Papers

                           DATA SOURCES
                           ============

  ┌─────────────┐   ┌──────────────┐   ┌──────────────┐   ┌───────────────┐
  │ bam-asr-    │   │  AfVoices    │   │   Djoko      │   │  Parents'     │
  │ early       │   │  253K audio  │   │  YouTube     │   │  Voice Memos  │
  │ 38K clean   │   │  Latin text  │   │  5.5K audio  │   │  Malinke      │
  │ N'Ko labels │   │  NO N'Ko    │   │  consensus   │   │  diarized     │
  └──────┬──────┘   └──────┬──────┘   └──────┬──────┘   └───────┬───────┘
         │                  │                  │                   │
         │         ┌────────┴────────┐         │                   │
         │         │ NEEDS TONE-AWARE│         │                   │
         │         │ TRANSLITERATION │         │                   │
         │         │ (blocked until  │         │                   │
         │         │  tone model)    │         │                   │
         │         └────────┬────────┘         │                   │
         │                  │                  │                   │
         ▼                  ▼                  ▼                   ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │                    CLEAN TRAINING DATA                           │
  │  Phase 1: 44K (bam + Djoko) ← CURRENT                          │
  │  Phase 2: +253K AfVoices (after tone resolution model)          │
  │  Phase 3: +parents' Malinke (after dialect adaptation)          │
  └──────────────────────────┬───────────────────────────────────────┘
                             │
                             ▼

                      YOUTUBE OCR PIPELINE
                      ====================

  ┌──────────────────┐
  │ babamamadidiane   │    2,001 N'Ko tutorial videos
  │ YouTube Channel   │    Teacher writes N'Ko on screen
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │ Frame Extraction  │    Key frames where text appears
  │ (already done)    │    Scene change detection
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │ Gemma 4 Scene    │    Visual description of each frame
  │ Analysis         │    "Teacher writing ߓߊ on whiteboard"
  │ (Task #31)       │    Context: what lesson, what word
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │ N'Ko OCR Model   │    Read N'Ko characters from frames
  │ (Gemma VLM or    │    WITH tone marks (visible on screen)
  │  fine-tuned       │    This is ground truth tonal N'Ko
  │  mlx-vlm)        │
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │ N'Ko Text Corpus │    Thousands of correctly-written
  │ WITH TONES       │    N'Ko sentences from video lessons
  │                  │    = Language model training data
  └────────┬─────────┘
           │
           ├─────────────────────────────────┐
           │                                 │
           ▼                                 ▼
  ┌──────────────────┐            ┌──────────────────┐
  │ Contextual Tone  │            │ N'Ko Language     │
  │ Resolution Model │            │ Model (LM)        │
  │                  │            │                    │
  │ Input: toneless  │            │ Perplexity-based  │
  │   N'Ko text      │            │ quality scoring   │
  │ Output: toned    │            │ for ASR output    │
  │   N'Ko text      │            │                    │
  └────────┬─────────┘            └──────────────────┘
           │
           │  Unlocks AfVoices:
           │  audio → ASR (toneless) → tone model (toned) → clean label
           │
           ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │                 FULL TRAINING DATA                               │
  │  44K clean + 253K tone-resolved AfVoices + YouTube OCR corpus   │
  │  = 300K+ high-quality N'Ko speech-text pairs                    │
  └──────────────────────────┬───────────────────────────────────────┘
                             │
                             ▼

                         MODELS
                         ======

  ┌──────────────────────────────────────────────────────────────────┐
  │                                                                  │
  │  ┌────────────────┐    ┌────────────────┐    ┌───────────────┐  │
  │  │ Frozen Whisper  │───▶│ CTC Head       │───▶│ N'Ko Text     │  │
  │  │ Encoder         │    │ + Trajectory   │    │ Output        │  │
  │  │ (ANE offload)   │    │ + TAR depth    │    │               │  │
  │  │                 │    │ (GPU training)  │    │               │  │
  │  └────────────────┘    └────────────────┘    └───────────────┘  │
  │                                                                  │
  │  ┌────────────────┐    ┌────────────────┐                       │
  │  │ N'Ko OCR       │───▶│ Tone Resolution│                       │
  │  │ (Visual)       │    │ (Contextual)   │                       │
  │  └────────────────┘    └────────────────┘                       │
  │                                                                  │
  │  ┌────────────────┐    ┌────────────────┐                       │
  │  │ KARL Trajectory│───▶│ Cognitive Twin │                       │
  │  │ Intelligence   │    │ (Fine-tuned)   │                       │
  │  └────────────────┘    └────────────────┘                       │
  │                                                                  │
  └──────────────────────────────────────────────────────────────────┘


                         PAPERS
                         ======

  Paper 4: "Script Advantage in CTC-ASR"                    [WRITTEN]
  ─────────────────────────────────────────
  Finding: N'Ko bijective script gives 5.25pp CER advantage
  over Latin with trajectory bias. Graph attention is
  script-dependent.
  Data: 37K bam-asr-early
  Status: Written, figures done, awaiting final numbers


  Paper 5: "Generalization & Speaker Adaptation"            [WRITTEN]
  ─────────────────────────────────────────
  Finding: N'Ko generalizes better to unseen words (Exp F),
  adapts faster per-speaker via TTT (Exp G), Djoko consensus
  methodology.
  Data: 37K + 5.5K Djoko
  Status: Draft complete


  Paper 6: "Trajectory Attention Residuals (TAR)"           [TRAINING]
  ─────────────────────────────────────────
  Question: Does depth-wise attention routing improve N'Ko
  more than Latin? Is the effect script-dependent?
  Architecture: TAR = trajectory scalars gate between
  standard residual and depth attention over all preceding
  layers.
  Data: 44K clean (bam + Djoko)
  Status: TAR experiment running on Vast.ai A100


  Paper 7: "Neural Engine Offload for Consumer Training"    [SPIKE DONE]
  ─────────────────────────────────────────
  Finding: M4 ANE runs Whisper-scale projections at 11.6
  TFLOPS (73.6% utilization) via reverse-engineered private
  APIs with quantized inline weights.
  Contribution: Three bridge fixes for macOS 26/M4,
  BLOBFILE workaround, first Whisper-scale ANE benchmark.
  Status: Spike proven, training loop next


  Paper 8: "Contextual Tone Resolution for N'Ko ASR"       [PLANNED]
  ─────────────────────────────────────────
  Question: Can a language model trained on OCR-extracted
  N'Ko text resolve tonal ambiguity in ASR output?
  Pipeline: YouTube OCR → N'Ko corpus → tone LM →
  ASR post-processing
  Data: babamamadidiane YouTube (2,001 videos) +
  Gemma 4 scene analysis + N'Ko OCR
  Unlocks: 253K AfVoices with correct tonal labels
  Status: Gemma 4 scene analysis queued (Task #31)


  Paper 9: "Distributed Training on Apple Neural Engine"    [FUTURE]
  ─────────────────────────────────────────
  Question: Can a mesh of Apple devices (Macs + iPhones +
  iPads) collectively train models via ANE offload +
  Thunder-Train gradient sync?
  Architecture: ANE frozen forward, GPU adapter backward,
  Thunderbolt gradient sync, CoreML on iOS devices
  Status: Depends on Paper 7 training loop proof


  Paper 10: "TurboQuant for Low-Resource Retrieval"         [FUTURE]
  ─────────────────────────────────────────
  Question: Can 4-bit vector quantization replace ANN
  indexes for RAG++ at 332K scale?
  Architecture: Hadamard rotation + scalar quantization,
  Rust sidecar for Orbit, coarse-to-fine retrieval
  Status: Library built, 0.993 cosine at 4-bit,
  needs real embedding benchmark


                    THE SELF-IMPROVING LOOP
                    =======================

  ┌─────────────────────────────────────────────────────────┐
  │                                                         │
  │   Train ASR (44K clean)                                │
  │        │                                                │
  │        ▼                                                │
  │   ASR labels AfVoices audio (253K)                     │
  │        │                                                │
  │        ▼                                                │
  │   OCR extracts toned N'Ko from YouTube (Paper 8)       │
  │        │                                                │
  │        ▼                                                │
  │   Tone model fixes ASR labels                          │
  │        │                                                │
  │        ▼                                                │
  │   Retrain ASR on 300K+ clean data                      │
  │        │                                                │
  │        ▼                                                │
  │   Better ASR → better labels → better ASR → ...        │
  │                                                         │
  │   Each cycle: ASR improves, corpus grows,              │
  │   tone model gets more training data                    │
  │                                                         │
  └─────────────────────────────────────────────────────────┘

Current Status (2026-04-05)

Component	Status	Next Action
44K clean data	READY	Training TAR now
253K AfVoices features	EXTRACTED	Waiting for tone model
N'Ko trajectory CER	27.39
TAR experiment	TRAINING	4 runs on A100
ANE spike	11.6 TFLOPS proven	Wire into training loop
YouTube OCR (Gemma 4)	QUEUED	Task #31 on Vast.ai
Tone resolution model	PLANNED	Depends on OCR corpus
TurboQuant	BUILT	Needs real embedding test
Contextual tone paper	PLANNED	Depends on OCR + tone model
ANE distributed paper	PLANNED	Depends on training loop proof

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

nko-brain-scanner/PIPELINE.md

Detected Structure

Method · Evaluation · Figures · Architecture