Back to corpus
working paperpreprint render candidatescore 100
Does Script Design Matter? Phonetic Transparency and CTC Decoding for N'Ko Automatic Speech Recognition
% Does Script Design Matter? Phonetic Transparency and CTC Decoding for N'Ko ASR % Target: Interspeech 2026 / ICASSP 2027
Full HTML reader
Read the full artifact
Extracted abstract or opening context
\noindent\textbf{Provenance note.} The fully verified artifact bundle currently archived in this repository is a fresh reproduction of the N'Ko trajectory-biased decoder on the current 290,596-pair corpus snapshot (232,476 train / 29,060 validation / 29,060 test; seed 42), which achieves 20.57\% test CER. We additionally completed four same-snapshot A100 ablations on this corpus snapshot under a stabilized safe rerun profile: N'Ko baseline (31.38\%), Latin baseline (31.66\%), Latin trajectory (32.81\%), and N'Ko TAR (31.69\%). These completed same-snapshot ablations all underperform the 20.57\% N'Ko trajectory anchor, which therefore remains the strongest verified configuration. Earlier N'Ko/Latin ablation numbers from an 8-run internal comparison are retained where noted because they motivated the script-dependent trajectory hypothesis, but the complete artifact bundle for all eight runs is not yet restored locally. Those historical comparative figures should therefore be read as provisional background evidence rather than as the primary benchmark.
\begin{abstract} Connectionist Temporal Classification (CTC) decoders must learn to align acoustic frames with output characters. We argue that the design of the target script measurably affects how well this alignment can be learned, and we now ground that claim in two current evidence layers: a fully verified N'Ko trajectory reproduction and a completed same-snapshot ablation bundle on the current 290,596-pair corpus snapshot.
N'Ko, a West African alphabetic script with a strict one-to-one phoneme-to-character mapping, produces a CTC output space of 66 classes. Latin Bambara, encoding the same language, requires the decoder to learn digraph compositions (\texttt{ny}, \texttt{ng}, \texttt{gb}), context-dependent character values, and carries no tonal information in the output labels. Theoretical considerations therefore predict that N'Ko should provide a cleaner alignment target for CTC-style decoders, especially when architectural mechanisms exploit phoneme-aligned boundaries.
The strongest artifact-complete result in this repository is a fresh reproduction of the N'Ko trajectory-biased decoder on 290,596 Bambara speech pairs (232,476/29,060/29,060 split; seed 42). This reproduced model reaches \textbf{20.57\% test CER}, with best validation loss 0.6359 at epoch 38 and early stopping at epoch 46 on an A100 80GB GPU. We then ran four matched same-snapshot ablations under a stabilized safe profile after rejecting an earlier non-finite run: N'Ko baseline (31.38\%), Latin baseline (31.66\%), Latin trajectory (32.81\%), and N'Ko TAR (31.69\%). All four underperform the N'Ko trajectory anchor, so the current best verified configuration remains plain N'Ko trajectory without TAR or TTT.
Earlier internal Apr
Promotion decision
What has to happen next
Compile/render the source, verify references and figures, then add to the curated atlas.
Why this is not always a full paper yet
Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.