← Writing
Language/Writing Systems as Infrastructure·5 min read

The Error Rate Is Not Neutral

Character error rate depends on the orthography you measure in. For N'Ko, that changes what the number means.

Every speech recognition paper reports an error rate, and almost none of them ask the question underneath it: an error against what? The answer is always an orthography, a convention for writing the language down. If that convention is a poor fit for the language, the error rate measures agreement with the convention, not recognition of the speech.

For most well-resourced languages this distinction is invisible because the orthography is settled and the training data agrees with itself. For the Manding languages it is the whole problem. Manding has been written in Latin script, in Arabic script, and in N'Ko, and the Latin conventions vary by country, by era, and by transcriber. A model trained on that mixture is being graded against a moving target.

The measurement thesis

The script you evaluate in changes what your numbers mean. A character error rate is only as meaningful as the writing system behind it.

N'Ko changes the measurement. It was designed for these languages: one symbol per sound, tone marked explicitly, vowel length and nasality carried in the orthography. When a recognizer is evaluated in N'Ko, a character error corresponds to a phonemic error. The metric becomes interpretable: you can say which sound was misheard, not just which glyph mismatched a transcriber's habit.

This is why my flagship paper treats N'Ko as computational infrastructure rather than a display choice. The script determines what tokenizers can represent, how acoustic evidence aligns to symbols, and whether a reported number measures speech recognition at all. Two systems can report the same error rate and one of them can be measuring something real while the other measures orthographic luck.

The practical consequence: before comparing models for a low-resource language, ask what they were scored against. If the references disagree with each other, the leaderboard is noise. Fixing the measurement is not preliminary work before the science. It is the science.