Grand Diomande Research · Full HTML Reader

Mixture of Anticipatory Orthogonal Experts for N'Ko ASR

The MoVE paper is directionally useful because it demonstrates a disciplined way to avoid a monolithic speech model: keep a pretrained audio-language model fixed, train specialized LoRA experts, and learn a router that blends or selects those experts by speech-token state. The useful idea is not the exact vocalization target. Their target is expressive speech-to-speech translation with emotion and non-verbal vocalization preservation. Our target is N'Ko ASR correction with acoustic authority, graph admissibility, i

Language as Infrastructure proposal experiment writeup candidate score 32 .md

Full Public Reader

Mixture of Anticipatory Orthogonal Experts for N'Ko ASR

Date: 2026-04-23

Position

The MoVE paper is directionally useful because it demonstrates a disciplined way to avoid a monolithic speech model: keep a pretrained audio-language model fixed, train specialized LoRA experts, and learn a router that blends or selects those experts by speech-token state. The useful idea is not the exact vocalization target. Their target is expressive speech-to-speech translation with emotion and non-verbal vocalization preservation. Our target is N'Ko ASR correction with acoustic authority, graph admissibility, inscription consistency, and bounded CER improvement.

The correct N'Ko analogue is a Mixture of Anticipatory Orthogonal Experts. "Anticipatory" means each expert is activated by trajectory scalars that describe where the decoder is going: stable, crossing a boundary, uncertain, recovering, or encountering novelty. "Orthogonal" means the experts are not all trying to do the same correction. Each expert owns a different axis of authority: acoustic preservation, boundary completion, uncertain repair, recovery context, or novelty quarantine. The router is not a generic MoE router optimizing token likelihood. It is an admissibility router deciding which authority is allowed to act.

Why This Is Paper-Worthy

This becomes paper-worthy if we can show that expert activation improves CER without increasing accepted-worse edits. A normal ASR rescoring paper says a language model improves transcription. This architecture says the language model is only allowed to act inside specific anticipation partitions and must pass an external admissibility gate. That makes the contribution less like "add an LM" and more like "formalize when a language prior is authorized to modify acoustic evidence."

The novelty claim is strongest for N'Ko because N'Ko is phonetically transparent and culturally tied to inscription. A correction is not just a better token. It is a transition between sound, glyph, trajectory, and canonical writing. Stable acoustic evidence should remain sovereign. Boundary evidence can accept one-glyph completion. Uncertain evidence can consult a corrective prior. Recovery can use context, but only locally. Novelty should be preserved for corpus growth instead of normalized into familiar text. That is a different scientific object from ordinary speech-to-text postprocessing.

What MoVE Confirms

MoVE confirms that specialized experts plus a learned router can be a better architecture than a single blended model when speech contains multiple latent regimes. It also supports the adapter-first path: train small expert adapters or heads instead of retraining the full model. For us, this strengthens the case for partition-specific N'Ko correction adapters, a small route/vitality/partition head, and a frozen or mostly frozen acoustic anchor.

MoVE also suggests that hard labels may be too crude. Their router can combine expert behavior continuously. Our first router is deterministic for auditability, but a later version can expose soft routing weights while still preserving the Rust admissibility boundary. The production rule should remain: soft neural routing may propose, deterministic admissibility decides.

Where We Diverge

MoVE routes vocalization and emotional expression. AGP-N'Ko routes correction authority. Their failure mode is flattened expressiveness. Our failure mode is hallucinated writing: a fluent N'Ko output that was not acoustically spoken, or an inscription-normalized form that destroys a novel corpus item. That means our router needs refusal as a first-class success case. Stable and novelty partitions are not "missed opportunities"; they are safety wins.

MoVE blends LoRA experts inside an AudioLLM. Our current implementation is more modular: trajectory CTC ASR, Python bridge, AGP/Gemma proposal lane, Rust control plane, Graph Kernel-shaped admissibility token, RAG++/TurboQuant provenance, and future Core ML/ANE route heads. We should keep that modularity until the ablations prove the joined model is worth training.

Expert Lanes

The initial expert lanes are:

text
stable    -> acoustic_preservation -> preserve ASR, emit witness
boundary  -> boundary_completion   -> allow one-glyph completion
uncertain -> uncertain_repair      -> AGP proposal + retrieval vote + Rust gate
recovery  -> recovery_context      -> retrieval-backed local repair with stricter edit cap
novelty   -> novelty_quarantine    -> preserve ASR, add to corpus/review index

The "orthogonal" property is that each lane has a different job, not merely a different prompt. This matters for evaluation. Stable should maximize refusal accuracy. Boundary should maximize small completion gains. Uncertain should maximize CER improvement under bounded edit distance. Recovery should maximize context repair without broad rewrite. Novelty should maximize preservation and review signal.

TurboQuant and ANE Placement

TurboQuant belongs in retrieval and provenance, not in the acoustic model. The current sidecar already shows a viable compressed vector path for candidate retrieval and exact-rerank preparation. In this architecture it should serve the uncertain and recovery experts first: retrieve similar N'Ko contexts, candidate spellings, or prior correction traces using compressed vectors, then let the Rust gate decide whether the retrieved prior can influence the output.

Apple Neural Engine should not be claimed as a proven speedup yet. The private MIL probe compiles but eval fails locally. The production-safe lane is to export small route/vitality/partition heads through Core ML, parity-test them, and only then claim accelerator use. The ANE target is not Whisper large-v3 or Gemma. It is the small router: deciding partition weights, route vitality, correction risk, and expert budget.

Experiments

The first experiment is deterministic router validation. Given ASR bridge packets, every row must emit an expert lane, compute lane, TurboQuant mode, accelerator status, and safety contract. This is now implemented by `experiments/agp_mlx/asr_bridge/expert_router.py` and `evaluate_expert_router_v1.py`.

The second experiment is Paper 4 prediction/reference replay. Once the current Vast run emits same-snapshot predictions and references, convert those rows into ASR bridge packets, run the expert router, then run the existing bounded correction evaluator. The metric is not only CER before/after. The report must include accepted improved, accepted neutral, accepted worse, rejected would improve, and partition-level acceptance.

The third experiment is expert adapter training. Build supervised rows shaped as `(partition, scalars, prefix, asr_candidate, n_best, reference)`. Train separate adapters or heads for boundary, uncertain, and recovery. Stable and novelty are refusal/control lanes, not correction lanes. Compare against a single generic correction adapter. If partition-specific experts reduce accepted-worse edits or improve CER at the same edit cap, the architecture has evidence.

The fourth experiment is soft router ablation. Replace deterministic partition activation with a small learned route head, ideally exported to Core ML after parity. The Rust control plane remains final authority. This tests whether MoVE-style soft routing adds value without giving up admissibility.

Paper Name

Working title:

text
Mixture of Anticipatory Orthogonal Experts: Admissible N'Ko ASR Correction Through Trajectory-Partitioned Language Priors

Short name:

text
MAOE-N'Ko

Claim Boundary

Right now this is an architecture and a partially materialized experiment lane, not a finished empirical paper. It becomes a paper when same-snapshot Paper 4 replay shows one of these outcomes:

text
CER after < CER before
accepted_worse = 0 or statistically negligible
stable refusal remains high
novelty override remains blocked
partition-specific experts beat a single generic correction adapter

If those hold, the contribution is real: not just a better N'Ko ASR model, but a disciplined way to let language priors help endangered-script ASR without letting them erase acoustic and inscription authority.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/docs/research/nko-mixture-of-anticipatory-orthogonal-experts-v1.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture