N'Ko Speech Search, Diarization, and TTS Architecture

Full HTML reader

Read the full artifact

Extracted abstract or opening context

Turn the existing N'Ko ASR + AGP stack into a speaker-aware speech system with three outputs: 1. **Provenance-first search** over N'Ko audio, transcripts, papers, and corrections. 2. **Improved diarization** for Djoko and future Bambara/Malinke broadcast corpora. 3. **N'Ko TTS / voice generation**, but only from a high-precision subset with explicit speaker boundaries and alignment confidence. This is not a generic web-search play. It is a vertical system for **Manding speech understanding, correction, retrieval, and eventually synthesis**. - **PyTorch/Whisper trajectory ASR on Vast** - **Gemma/AGP corrective layer on Mac4/Mac5** - `djoko_speakers.json` - `7` weak speaker clusters - `6,625` diarized segments - `5` eligible speakers for adaptation experiments - `djoko_transcriptions.jsonl` - historical first-pass N'Ko transcriptions - quality is still noisy and often collapses into repeated characters - `consensus_pairs.jsonl` - filtered subset with confidence and text-quality metadata

Promotion decision

What has to happen next

Promote into a technical note or architecture paper with implementation anchors.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.