Mohamed Diomande

Full HTML reader

Read the full artifact

Extracted abstract or opening context

1. **No thin wrappers.** `nko_core/__init__.py` handles all imports from `Desktop/NKo/` via `sys.path`. No separate `phonetics.py`, `transliterate.py`, `morphology.py` wrapper files. If `from nko_core import phonetics` works, no wrapper is needed. 2. **No premature release.** HuggingFace upload happens AFTER mode collapse is fixed and the model generates coherent N'Ko text. Not before. 3. **Architecture matches disk.** Every file listed below exists. Every number is current. If reality changes, this doc gets updated. > **Serving / on-device decision:** research runs on the **GPU/standard Whisper** feature > path (1500-frame, reproducible, where clean numbers are earned); on-device serving is the > **Apple Neural Engine + TurboQuant** stack (Phase 2). One hard rule — a head must be served > with the *same* feature extractor it was trained on (the GPU↔ANE 1500-vs-375-frame skew is > what produced all-blank output on the anchor). See > [`docs/adr/ADR-001-on-device-serving-and-quantization.md`](docs/adr/ADR-001-on-device-serving-and-quantization.md). | File | Purpose | |------|---------| | `asr/bambara_translator.py` | Tiered translation: greetings -> corpus -> dictionary -> NLLB -> Ollama | | `asr/postprocess.py` | N'Ko syllable FSM validation and CTC output cleanup | | `asr/train_whisper_lora.py` | Whisper LoRA fine-tuning with checkpoint/resume | | `asr/train_nllb_lora.py` | NLLB-200 LoRA fine-tuning for Bambara translation | | `asr/prepare_nllb_data.py` | Extract parallel pairs from 5 sources for NLLB training | | `asr/eval_whisper_lora.py` | WER/CER evaluation harness (base vs LoRA comparison) | | `asr/convert_lora_to_ggml.py` | LoRA -> merged -> GGML -> quantized -> iOS bundle pipeline | | `asr/bridge_to_nko.py` | Latin Bambara -> N'Ko script conversion | | `asr/audio_encoder.py` | Whisper encoder (frozen features, d=512) | | `asr/joint_embedding.py` | Shared embedding space (d=512) + contrastive/retrieval loss | | `asr/train_asr.py` | Multi-loss training loop (contrastive + retrieval) | | `asr/speaker_diarizer.py` | pyannote speaker clustering + VADOnly fallback | | `asr/scene_encoder.py` | SigLIP keyframe feature extraction (d=512) | | `asr/syllable_retriever.py` | Codebook retrieval + FSM assembly | | `asr/audio_pipeline.py` | YouTube -> audio extraction + VAD | | `demo/realtime_asr.py` | Live demo server (HTTPS, CTC decode, Whisper encoder) | ### Completed - [x] T1: Brain scan activation profiling (72B on A100 + 8B on M4) - [x] T2: Three-stage training V1 (CPT + SFT + BPE, val loss 4.29) - [x] T3: Constrained decoding FSM (100% syllable validity) - [x] T4: BPE tokenizer (512 merges, 614 vocab) - [x] T5: Morpheme-aware BPE tokenizer (158 merges, 206 vocab) - [x] T6: Embedding extension pipeline (151,936 → 152,192 vocab) - [x] T7: V2 LoRA trai

Promotion decision

What has to happen next

Promote into a technical note or architecture paper with implementation anchors.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.