Back to corpus
proposalexperiment writeup candidatescore 32

Mixture of Anticipatory Orthogonal Experts for N'Ko ASR

The MoVE paper is directionally useful because it demonstrates a disciplined way to avoid a monolithic speech model: keep a pretrained audio-language model fixed, train specialized LoRA experts, and learn a router that blends or selects those experts by speech-token state. The useful idea is not the exact vocalization target. Their target is expressive speech-to-speech translation with emotion and non-verbal vocalization preservation. Our target is N'Ko ASR correction with acoustic authority, graph admissibility, i

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

The MoVE paper is directionally useful because it demonstrates a disciplined way to avoid a monolithic speech model: keep a pretrained audio-language model fixed, train specialized LoRA experts, and learn a router that blends or selects those experts by speech-token state. The useful idea is not the exact vocalization target. Their target is expressive speech-to-speech translation with emotion and non-verbal vocalization preservation. Our target is N'Ko ASR correction with acoustic authority, graph admissibility, inscription consistency, and bounded CER improvement. The correct N'Ko analogue is a Mixture of Anticipatory Orthogonal Experts. "Anticipatory" means each expert is activated by trajectory scalars that describe where the decoder is going: stable, crossing a boundary, uncertain, recovering, or encountering novelty. "Orthogonal" means the experts are not all trying to do the same correction. Each expert owns a different axis of authority: acoustic preservation, boundary completion, uncertain repair, recovery context, or novelty quarantine. The router is not a generic MoE router optimizing token likelihood. It is an admissibility router deciding which authority is allowed to act. This becomes paper-worthy if we can show that expert activation improves CER without increasing accepted-worse edits. A normal ASR rescoring paper says a language model improves transcription. This architecture says the language model is only allowed to act inside specific anticipation partitions and must pass an external admissibility gate. That makes the contribution less like "add an LM" and more like "formalize when a language prior is authorized to modify acoustic evidence." The novelty claim is strongest for N'Ko because N'Ko is phonetically transparent and culturally tied to inscription. A correction is not just a better token. It is a transition between sound, glyph, trajectory, and canonical writing. Stable acoustic evidence should remain sovereign. Boundary evidence can accept one-glyph completion. Uncertain evidence can consult a corrective prior. Recovery can use context, but only locally. Novelty should be preserved for corpus growth instead of normalized into familiar text. That is a different scientific object from ordinary speech-to-text postprocessing. MoVE confirms that specialized experts plus a learned router can be a better architecture than a single blended model when speech contains multiple latent regimes. It also supports the adapter-first path: train small expert adapters or heads instead of retraining the full model. For us, this strengthens the case for partition-specific N'Ko correction adapters, a small route/vitality/partition head, and a frozen or mostly frozen acoustic anchor.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.