Grand Diomande Research · Full HTML Reader

N'Ko Intelligence Pipeline — Full Architecture

┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │ bam-asr- │ │ AfVoices │ │ Djoko │ │ Parents' │ │ early │ │ 253K audio │ │ YouTube │ │ Voice Memos │ │ 38K clean │ │ Latin text │ │ 5.5K audio │ │ Malinke │ │ N'Ko labels │ │ NO N'Ko │ │ consensus │ │ diarized │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └───────┬───────┘ │ │ │ │ │ ┌────────┴────────┐ │ │ │ │ NEEDS TONE-AWARE│ │ │ │ │ TRANSLITERATION │ │ │ │ │ (blocked until │ │ │ │ │ tone model) │ │ │ │ └────────┬────────┘ │ │ │ │ │ │ ▼ ▼ ▼

Language as Infrastructure architecture technical paper candidate score 44 .md

Full Public Reader

N'Ko Intelligence Pipeline — Full Architecture

Data Sources → Models → Papers

                           DATA SOURCES
                           ============

  ┌─────────────┐   ┌──────────────┐   ┌──────────────┐   ┌───────────────┐
  │ bam-asr-    │   │  AfVoices    │   │   Djoko      │   │  Parents'     │
  │ early       │   │  253K audio  │   │  YouTube     │   │  Voice Memos  │
  │ 38K clean   │   │  Latin text  │   │  5.5K audio  │   │  Malinke      │
  │ N'Ko labels │   │  NO N'Ko    │   │  consensus   │   │  diarized     │
  └──────┬──────┘   └──────┬──────┘   └──────┬──────┘   └───────┬───────┘
         │                  │                  │                   │
         │         ┌────────┴────────┐         │                   │
         │         │ NEEDS TONE-AWARE│         │                   │
         │         │ TRANSLITERATION │         │                   │
         │         │ (blocked until  │         │                   │
         │         │  tone model)    │         │                   │
         │         └────────┬────────┘         │                   │
         │                  │                  │                   │
         ▼                  ▼                  ▼                   ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │                    CLEAN TRAINING DATA                           │
  │  Phase 1: 44K (bam + Djoko) ← CURRENT                          │
  │  Phase 2: +253K AfVoices (after tone resolution model)          │
  │  Phase 3: +parents' Malinke (after dialect adaptation)          │
  └──────────────────────────┬───────────────────────────────────────┘
                             │
                             ▼

                      YOUTUBE OCR PIPELINE
                      ====================

  ┌──────────────────┐
  │ babamamadidiane   │    2,001 N'Ko tutorial videos
  │ YouTube Channel   │    Teacher writes N'Ko on screen
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │ Frame Extraction  │    Key frames where text appears
  │ (already done)    │    Scene change detection
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │ Gemma 4 Scene    │    Visual description of each frame
  │ Analysis         │    "Teacher writing ߓߊ on whiteboard"
  │ (Task #31)       │    Context: what lesson, what word
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │ N'Ko OCR Model   │    Read N'Ko characters from frames
  │ (Gemma VLM or    │    WITH tone marks (visible on screen)
  │  fine-tuned       │    This is ground truth tonal N'Ko
  │  mlx-vlm)        │
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │ N'Ko Text Corpus │    Thousands of correctly-written
  │ WITH TONES       │    N'Ko sentences from video lessons
  │                  │    = Language model training data
  └────────┬─────────┘
           │
           ├─────────────────────────────────┐
           │                                 │
           ▼                                 ▼
  ┌──────────────────┐            ┌──────────────────┐
  │ Contextual Tone  │            │ N'Ko Language     │
  │ Resolution Model │            │ Model (LM)        │
  │                  │            │                    │
  │ Input: toneless  │            │ Perplexity-based  │
  │   N'Ko text      │            │ quality scoring   │
  │ Output: toned    │            │ for ASR output    │
  │   N'Ko text      │            │                    │
  └────────┬─────────┘            └──────────────────┘
           │
           │  Unlocks AfVoices:
           │  audio → ASR (toneless) → tone model (toned) → clean label
           │
           ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │                 FULL TRAINING DATA                               │
  │  44K clean + 253K tone-resolved AfVoices + YouTube OCR corpus   │
  │  = 300K+ high-quality N'Ko speech-text pairs                    │
  └──────────────────────────┬───────────────────────────────────────┘
                             │
                             ▼

                         MODELS
                         ======

  ┌──────────────────────────────────────────────────────────────────┐
  │                                                                  │
  │  ┌────────────────┐    ┌────────────────┐    ┌───────────────┐  │
  │  │ Frozen Whisper  │───▶│ CTC Head       │───▶│ N'Ko Text     │  │
  │  │ Encoder         │    │ + Trajectory   │    │ Output        │  │
  │  │ (ANE offload)   │    │ + TAR depth    │    │               │  │
  │  │                 │    │ (GPU training)  │    │               │  │
  │  └────────────────┘    └────────────────┘    └───────────────┘  │
  │                                                                  │
  │  ┌────────────────┐    ┌────────────────┐                       │
  │  │ N'Ko OCR       │───▶│ Tone Resolution│                       │
  │  │ (Visual)       │    │ (Contextual)   │                       │
  │  └────────────────┘    └────────────────┘                       │
  │                                                                  │
  │  ┌────────────────┐    ┌────────────────┐                       │
  │  │ KARL Trajectory│───▶│ Cognitive Twin │                       │
  │  │ Intelligence   │    │ (Fine-tuned)   │                       │
  │  └────────────────┘    └────────────────┘                       │
  │                                                                  │
  └──────────────────────────────────────────────────────────────────┘


                         PAPERS
                         ======

  Paper 4: "Script Advantage in CTC-ASR"                    [WRITTEN]
  ─────────────────────────────────────────
  Finding: N'Ko bijective script gives 5.25pp CER advantage
  over Latin with trajectory bias. Graph attention is
  script-dependent.
  Data: 37K bam-asr-early
  Status: Written, figures done, awaiting final numbers


  Paper 5: "Generalization & Speaker Adaptation"            [WRITTEN]
  ─────────────────────────────────────────
  Finding: N'Ko generalizes better to unseen words (Exp F),
  adapts faster per-speaker via TTT (Exp G), Djoko consensus
  methodology.
  Data: 37K + 5.5K Djoko
  Status: Draft complete


  Paper 6: "Trajectory Attention Residuals (TAR)"           [TRAINING]
  ─────────────────────────────────────────
  Question: Does depth-wise attention routing improve N'Ko
  more than Latin? Is the effect script-dependent?
  Architecture: TAR = trajectory scalars gate between
  standard residual and depth attention over all preceding
  layers.
  Data: 44K clean (bam + Djoko)
  Status: TAR experiment running on Vast.ai A100


  Paper 7: "Neural Engine Offload for Consumer Training"    [SPIKE DONE]
  ─────────────────────────────────────────
  Finding: M4 ANE runs Whisper-scale projections at 11.6
  TFLOPS (73.6% utilization) via reverse-engineered private
  APIs with quantized inline weights.
  Contribution: Three bridge fixes for macOS 26/M4,
  BLOBFILE workaround, first Whisper-scale ANE benchmark.
  Status: Spike proven, training loop next


  Paper 8: "Contextual Tone Resolution for N'Ko ASR"       [PLANNED]
  ─────────────────────────────────────────
  Question: Can a language model trained on OCR-extracted
  N'Ko text resolve tonal ambiguity in ASR output?
  Pipeline: YouTube OCR → N'Ko corpus → tone LM →
  ASR post-processing
  Data: babamamadidiane YouTube (2,001 videos) +
  Gemma 4 scene analysis + N'Ko OCR
  Unlocks: 253K AfVoices with correct tonal labels
  Status: Gemma 4 scene analysis queued (Task #31)


  Paper 9: "Distributed Training on Apple Neural Engine"    [FUTURE]
  ─────────────────────────────────────────
  Question: Can a mesh of Apple devices (Macs + iPhones +
  iPads) collectively train models via ANE offload +
  Thunder-Train gradient sync?
  Architecture: ANE frozen forward, GPU adapter backward,
  Thunderbolt gradient sync, CoreML on iOS devices
  Status: Depends on Paper 7 training loop proof


  Paper 10: "TurboQuant for Low-Resource Retrieval"         [FUTURE]
  ─────────────────────────────────────────
  Question: Can 4-bit vector quantization replace ANN
  indexes for RAG++ at 332K scale?
  Architecture: Hadamard rotation + scalar quantization,
  Rust sidecar for Orbit, coarse-to-fine retrieval
  Status: Library built, 0.993 cosine at 4-bit,
  needs real embedding benchmark


                    THE SELF-IMPROVING LOOP
                    =======================

  ┌─────────────────────────────────────────────────────────┐
  │                                                         │
  │   Train ASR (44K clean)                                │
  │        │                                                │
  │        ▼                                                │
  │   ASR labels AfVoices audio (253K)                     │
  │        │                                                │
  │        ▼                                                │
  │   OCR extracts toned N'Ko from YouTube (Paper 8)       │
  │        │                                                │
  │        ▼                                                │
  │   Tone model fixes ASR labels                          │
  │        │                                                │
  │        ▼                                                │
  │   Retrain ASR on 300K+ clean data                      │
  │        │                                                │
  │        ▼                                                │
  │   Better ASR → better labels → better ASR → ...        │
  │                                                         │
  │   Each cycle: ASR improves, corpus grows,              │
  │   tone model gets more training data                    │
  │                                                         │
  └─────────────────────────────────────────────────────────┘

Current Status (2026-04-05)

ComponentStatusNext Action
44K clean dataREADYTraining TAR now
253K AfVoices featuresEXTRACTEDWaiting for tone model
N'Ko trajectory CER27.39
TAR experimentTRAINING4 runs on A100
ANE spike11.6 TFLOPS provenWire into training loop
YouTube OCR (Gemma 4)QUEUEDTask #31 on Vast.ai
Tone resolution modelPLANNEDDepends on OCR corpus
TurboQuantBUILTNeeds real embedding test
Contextual tone paperPLANNEDDepends on OCR + tone model
ANE distributed paperPLANNEDDepends on training loop proof

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

nko-brain-scanner/PIPELINE.md

Detected Structure

Method · Evaluation · Figures · Architecture