Back to corpus
research noteexperiment writeup candidatescore 26

Living Speech: Building the First N'Ko ASR for 14

Every speech recognition system for Bambara outputs Latin characters. French colonial characters designed for French colonial administrators. Not for the 40 million people who actually speak and read these languages.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

Every speech recognition system for Bambara outputs Latin characters. French colonial characters designed for French colonial administrators. Not for the 40 million people who actually speak and read these languages. This is the story of how a $14 experiment on consumer hardware produced a speech recognition system that competes with models 38 times its size. And why Solomana Kante's 1949 design decisions turned out to be a machine learning advantage nobody predicted. There is no N'Ko speech corpus. Every Bambara audio dataset has Latin transcriptions. The FarmRadio datasets, the MALIBA-AI models, the Bayelemabaga corpus. All Latin output. If you want to build a system that listens to Bambara and writes N'Ko, you first need to solve a problem that nobody has needed to solve before: convert thousands of Latin Bambara transcriptions into valid N'Ko. Latin Bambara was designed by French colonial linguists in the 20th century. It reflects French phonological conventions, not Manding phonological reality.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.