Back to corpus
working paperpreprint structure candidatescore 90

Retrieval-Centric ASR for N'Ko: Exploiting Script Structure to Beat Sequence-to-Sequence

We present a retrieval-centric automatic speech recognition (ASR) architecture for Bambara, targeting N'Ko script output directly rather than routing through Latin transcription. The central insight is structural: N'Ko enforces a strict 1:1 phoneme-to-grapheme mapping, explicit tonal diacritics, and a mathematically complete syllable inventory of 3,024 entries (all V, VN, CV, and CVN patterns across five tones). This finite, well-structured output space makes retrieval a better fit than sequence-to-sequence decodin

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

We present a retrieval-centric automatic speech recognition (ASR) architecture for Bambara, targeting N'Ko script output directly rather than routing through Latin transcription. The central insight is structural: N'Ko enforces a strict 1:1 phoneme-to-grapheme mapping, explicit tonal diacritics, and a mathematically complete syllable inventory of 3,024 entries (all V, VN, CV, and CVN patterns across five tones). This finite, well-structured output space makes retrieval a better fit than sequence-to-sequence decoding. Our pipeline freezes a Whisper encoder to extract audio embeddings, projects them into a shared 512-dimensional space alongside N'Ko syllable embeddings, and retrieves the nearest codebook entry at each step. A 4-state finite-state machine (FSM) encoding N'Ko phonotactics constrains beam search during assembly, guaranteeing that every output token sequence forms a valid N'Ko syllable chain. Training data comes from two YouTube sources: 1,461 episodes of Djoko dialogue (audio + FarmRadio Whisper transcription, bridged to N'Ko) and 532 babamamadidiane teaching videos (dynamic scene detection + Gemini 3 Flash OCR for on-screen N'Ko extraction). The current best published result for Bambara ASR is MALIBA-AI bambara-asr-v3 at 45.73% WER on Latin-script output. Our architecture bypasses Latin entirely. Quantitative results on real audio will be reported as training completes. **Keywords:** low-resource ASR, N'Ko, Bambara, Manding, retrieval-augmented, finite-state machine, CTC, script-structure exploitation

Promotion decision

What has to happen next

Convert into the standard paper schema, add citations, and render a draft PDF.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.