Back to corpus
working paperpreprint render candidatescore 100

Against WER: Phonemic Evaluation, Orthographic Transparency, and the Script Advantage for Manding ASR

Automatic speech recognition for Manding languages is usually reported through Latin-script word error rate. This paper argues that the metric is scientifically weak for the research question at hand. If the goal is to evaluate whether an ASR system recognizes Bambara, Maninka, Dioula, or related Manding speech, then the scoring units should preserve the acoustic-phonemic distinctions carried by the language. Latin Bambara orthography is useful and socially real, but it is not a lossless measurement interface: it u

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

Automatic speech recognition for Manding languages is usually reported through Latin-script word error rate. This paper argues that the metric is scientifically weak for the research question at hand. If the goal is to evaluate whether an ASR system recognizes Bambara, Maninka, Dioula, or related Manding speech, then the scoring units should preserve the acoustic-phonemic distinctions carried by the language. Latin Bambara orthography is useful and socially real, but it is not a lossless measurement interface: it uses digraphs for single phonemes, leaves tone unmarked or inconsistently represented, and allows convention-dependent variation. \nko{}, by contrast, was designed for Manding phonology and gives the ASR system a more transparent character target. The core contribution is a metric argument. I formalize the difference between a transparent script map $f_N:\Phi \rightarrow \Sigma_N$ from phonemic units to script units and a variable-length Latin transcription relation $R_L \subset \Phi^* \times \Sigma_L^*$. Under normalization assumptions, edit distance over a bijective or near-bijective script preserves phoneme-edit structure more directly than word error rate over a many-to-many transcription convention. It does not become a perfect phoneme error rate: tone policy, diacritics, punctuation, Unicode normalization, reference quality, and scorer granularity still matter. But \nko{} character error rate is more interpretable for Manding ASR than Latin WER because a character substitution is closer to a sound-symbol substitution, while a Latin word error can mix acoustic error, digraph segmentation, spelling convention, and tokenization. The paper also defines the claim boundary needed for the 20.57\% CER anchor used in the broader project. The anchor is meaningful because it is a direct \nko{} ASR number over script-native output; it should not be translated into a Latin WER leaderboard claim or used to assert that \nko{} beats Latin under every matched condition. The rigorous conclusion is narrower and stronger: for Manding ASR, \nko{} CER is the better primary measurement target when the scientific object is phonemic speech recognition rather than agreement with a Latin orthographic convention.

Promotion decision

What has to happen next

Compile/render the source, verify references and figures, then add to the curated atlas.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.