Grand Diomande Research · Full HTML Reader

Mixture of Anticipatory Orthogonal Experts for N'Ko ASR

Language as Infrastructure working paper preprint structure candidate score 86 .md

Full Public Reader

Mixture of Anticipatory Orthogonal Experts for N'Ko ASR

A Technical Architecture for Admissible Speech Correction, AGP Routing, TurboQuant Provenance, and Future ANE Reflex Heads

Date: 2026-04-23
Status: technical documentation / paper draft
Short name: MAOE-N'Ko
Primary codebase: `[home]/Desktop/Comp-Core`

Abstract

MAOE-N'Ko, the Mixture of Anticipatory Orthogonal Experts for N'Ko ASR, is a modular speech-language correction architecture that keeps the acoustic model sovereign while allowing language-prior intelligence to act only where it is admissible. The system begins with a verified N'Ko trajectory CTC acoustic model, currently anchored by the Paper 4 reproduction checkpoint with 20.57 percent CER on the locked N'Ko run. Instead of replacing that model with a monolithic audio-language system, MAOE-N'Ko routes each ASR chunk into an anticipation partition: stable, boundary, uncertain, recovery, or novelty. Each partition activates a different expert lane with a distinct authority contract: acoustic preservation, boundary completion, uncertain repair, recovery context, or novelty quarantine.

The architecture differs from a conventional mixture-of-experts model because the experts are orthogonal in authority, not merely parallel in capacity. A normal MoE router selects among experts that all try to improve token likelihood. MAOE-N'Ko selects among experts that are allowed to do fundamentally different things. Stable evidence is preserved. Boundary evidence can accept a small completion. Uncertain evidence can consult an AGP correction prior and TurboQuant-backed retrieval. Recovery evidence can use context, but under a stricter edit cap. Novel evidence is blocked from language-prior normalization and routed into review/corpus growth. The final accept/reject decision is not made by the neural model. It is made by a deterministic Rust control plane with an admissibility witness.

The result is not merely "ASR plus a language model." It is a layered authority system for endangered-script speech recognition. The acoustic model answers what was heard. AGP proposes what may be structurally plausible. TurboQuant compresses retrieval and provenance state. RAG++ and Graph Kernel-style witnesses preserve evidence chains. Rust enforces bounded correction. Future Core ML / Apple Neural Engine heads can run the small route, vitality, and partition classifiers without claiming that ANE replaces full transformer inference. The paper contribution becomes credible if same-snapshot Paper 4 replay shows lower CER after expert routing while accepted-worse corrections remain zero or statistically negligible.

1. Motivation

The N'Ko ASR problem is not just a transcription problem. It is a speech-to-writing problem for a phonetic script that carries cultural, orthographic, and inscriptional structure. That makes the failure mode of a normal language-model corrector unusually dangerous. A fluent N'Ko string can be wrong if it overwrites acoustic evidence. A familiar-looking lexical form can be harmful if it erases a novel speaker, a dialectal item, or a corpus-expansion signal. For this reason, the language model must not be allowed to become the acoustic authority.

The current verified acoustic anchor is the trajectory-biased N'Ko CTC model. That model is valuable because it gives the system a measurable baseline and a stable acoustic interpretation layer. The correct next step is not to discard that anchor. The correct next step is to wrap it in a routing architecture that decides when an external prior is allowed to act.

This is where AGP enters. AGP, in this stack, is not the ASR model. It is a corrective and anticipatory language-prior substrate. It can score, propose, or refine N'Ko continuations, but only after the ASR telemetry has been partitioned and only before the Rust gate decides whether the proposal is admissible.

2. Relation to MoVE and Why This Is Different

The MoVE paper, "Mixture of Vocalization Experts for Expressive Speech-to-Speech Translation", is useful because it demonstrates the value of specialized speech experts and routing. MoVE freezes a pretrained AudioLLM, trains LoRA experts around expressive vocalization regimes, and learns a router that blends those experts so that speech-to-speech translation preserves expressive phenomena such as emotion and non-verbal vocalization.

MAOE-N'Ko borrows the architectural insight, not the objective. MoVE routes expressive vocalization. MAOE-N'Ko routes correction authority. MoVE asks which expressive expert should shape the generated speech. MAOE-N'Ko asks which system authority is allowed to modify a N'Ko transcript. This is a different scientific object.

The comparable part is expert routing. The divergence is the safety contract. MoVE's router is designed to preserve expressive fidelity. MAOE-N'Ko's router is designed to preserve acoustic sovereignty and inscription admissibility. In MoVE, experts are specialized by vocal style. In MAOE-N'Ko, experts are specialized by trajectory state and authority boundary.

This is why "orthogonal" matters. The experts are not five redundant correction models. They are five different lanes of permissible behavior:

text

stable    -> acoustic preservation
boundary  -> boundary completion
uncertain -> uncertain repair
recovery  -> recovery context
novelty   -> novelty quarantine

The architecture therefore confirms that MoVE points in the right broad direction: modular experts and routers are preferable to a single opaque monolith when speech contains multiple latent regimes. But our implementation should remain stricter than MoVE because our goal is not just generation quality. It is admissible correction under evidence constraints.

3. Layered Architecture

The complete MAOE-N'Ko architecture is a layered system. Each layer has a distinct role, and no layer is allowed to silently absorb the responsibility of another.

Layer 1. Acoustic Evidence

The acoustic layer receives audio and produces N'Ko hypotheses. The current intended stack is:

text

audio
  -> Whisper large-v3 encoder features
  -> trajectory-biased CTC decoder
  -> N'Ko hypothesis and optional n-best candidates

This layer is the acoustic authority. It owns the question: what did the speaker likely say? It should remain stable until a better same-provenance ASR model is trained and verified.

Layer 2. Trajectory Scalars

The trajectory layer extracts scalar evidence describing the decoder state. The bridge currently uses fields such as confidence, entropy or uncertainty, boundary score, novelty, recovery margin, and stability. These values are not cosmetic metadata. They are the routing substrate. They say whether the decoder is stable, crossing a boundary, uncertain, recovering from a likely prior error, or encountering novelty.

Layer 3. Anticipation Partition Router

The partition router converts scalar telemetry into one of five regimes:

text

stable
boundary
uncertain
recovery
novelty

This is implemented in:

text

experiments/agp_mlx/asr_bridge/partition_policy.py

The rule is intentionally deterministic for the first experiments. A later learned route head can replace or augment this classifier, but deterministic routing gives the first paper a clean audit trail.

Layer 4. Orthogonal Expert Selection

The expert router maps partitions into expert lanes, compute budgets, TurboQuant modes, accelerator status, and safety contracts. This is implemented in:

text

experiments/agp_mlx/asr_bridge/expert_router.py
experiments/agp_mlx/asr_bridge/evaluate_expert_router_v1.py

The current mapping is:

text

stable    -> acoustic_preservation -> ASR only, Rust witness
boundary  -> boundary_completion   -> AGP proposal, route head, one-glyph completion
uncertain -> uncertain_repair      -> AGP proposal, TurboQuant retrieval, Rust gate
recovery  -> recovery_context      -> retrieval-backed repair under stricter edit cap
novelty   -> novelty_quarantine    -> ASR preservation and corpus review

This layer is the core MAOE contribution. It explicitly separates correction behaviors that are often blurred together in ordinary postprocessing.

Layer 5. AGP Proposal Lane

AGP supplies a bounded language-prior proposal. The current AGP topology has included a promoted corrective lane backed by Gemma-family MLX inference and a domain adapter. The AGP lane does not decide truth. It proposes a correction candidate or ranks plausible continuations.

The current design principle is:

text

AGP may propose.
Rust decides.
Admissibility records why.

The AGP stack is used for boundary, uncertain, and recovery lanes. It is skipped for stable evidence and blocked for novelty.

Layer 6. TurboQuant Retrieval and Provenance Compression

TurboQuant is the compressed retrieval and packet-mobility layer. It does not replace the ASR model. It is used where retrieval or state transport becomes the bottleneck. The current sidecar lives at:

text

core/retrieval/cc-turboquant-index/
benchmarks/agp-turboquant-ane/

The measured local sidecar proves that 4-bit and 8-bit compressed vector indexes can preserve useful top-k behavior while reducing memory footprint. In MAOE-N'Ko, TurboQuant belongs first in the uncertain and recovery expert lanes, where the system needs compact access to similar N'Ko contexts, prior correction traces, lexical neighbors, or provenance slices.

Layer 7. Rust Control Plane

The Rust control plane is the deterministic accept/reject boundary. It lives at:

text

core/semantic/cc-agp-bridge/

Its job is not neural inference. Its job is to classify partitions, enforce edit bounds, reject unsafe proposals, and issue replayable decision records. Stable partitions preserve ASR. Novelty partitions block language-prior overrides. Boundary, uncertain, and recovery partitions can accept local edits only under strict distance budgets.

This is the layer that makes the architecture more than a language-model correction prompt.

Layer 8. Graph Kernel and RAG++ Admissibility

Every accepted or rejected correction carries a provisional Graph Kernel-shaped admissibility witness. The witness includes slice identity, graph snapshot hash, policy hash, query hash, decision hash, and an admissibility token. RAG++ provenance sits beside that witness as retrieval chain of custody. The purpose is not to improve CER directly. The purpose is to make CER improvements attributable. If a correction helped, we can replay why it was allowed.

Layer 9. Core ML / Apple Neural Engine Reflex Heads

The Apple Neural Engine should not be described as running the full system today. The current private MIL path can compile but has not passed reliable eval locally. The production-safe target is narrower and more realistic: export small route, vitality, semantic, partition, and correction-risk heads through Core ML, test parity, then use ANE for low-power repeated routing decisions.

In MAOE-N'Ko, ANE is a reflex substrate, not the full brain. MLX/GPU handles dense transformer and adapter computation. ANE should handle small projection-heavy routing heads once verified.

4. How AGP Was Trained and Promoted

The AGP lane is not a randomly prompted model. The local runtime has used a Gemma-family MLX backbone with adapter training and promoted corrective lanes. The documented promoted topology included:

text

fast lane:       hidden/q8_0
corrective lane: summary/q8_0 promoted
fallback lane:   summary/q8_0
alternate lane:  summary/q8_0

The promoted corrective lane was backed by:

text

model: mlx-community/gemma-4-e2b-4bit
adapter: experiments/agp_mlx/train/runs/gemma4_e2b_domain_thunder_stage1_seq512
corrective run: experiments/agp_mlx/transfer/runs/gap_focus_transfer_fm007_len32_v1_20260420_034935

The important point is that AGP is trained and promoted as a domain corrective substrate. It is not yet trained specifically as the final N'Ko ASR correction expert. For MAOE-N'Ko, the current AGP lane is the proposal layer and experiment substrate. The next training wave should create partition-specific correction data:

text

partition
trajectory scalars
N'Ko prefix
ASR candidate
n-best candidates
reference N'Ko
admissibility outcome

Then train or fine-tune separate adapters for boundary, uncertain, and recovery. Stable and novelty are not correction adapters. They are refusal/control lanes.

5. Runtime Utilization

At runtime the system behaves as follows:

text

1. Audio enters the ASR stack.
2. The trajectory CTC model emits a N'Ko hypothesis.
3. Telemetry scalars are extracted for the chunk.
4. The partition router classifies the chunk.
5. The expert router selects the admissible expert lane.
6. If permitted, AGP proposes a bounded correction.
7. If useful, TurboQuant retrieves compact context or prior traces.
8. Rust applies the final safety contract.
9. The final text is emitted with an admissibility witness.
10. Accepted, rejected, and novelty cases are logged for future training.

This produces a self-improving loop without letting the model rewrite everything. The system can learn from rejected would-improve cases, accepted neutral cases, and accepted improved cases separately. That matters because the failures are not all the same. A rejected would-improve case may indicate an edit cap that is too strict or a novelty heuristic that is too conservative. An accepted neutral case may indicate a proposal model that is not harmful but also not useful. An accepted worse case is the dangerous category and should drive gate tightening or adapter retraining.

6. Experiments

The current executable experiment lane has three stages.

First, deterministic router validation. This has already been implemented and smoke-tested. The smoke fixture emits expert lanes, compute lanes, TurboQuant modes, Core ML candidate status, and safety contracts.

Second, Paper 4 matrix replay. Once the current A100 training matrix emits paired `test_predictions.jsonl` and `test_references.jsonl`, the converter builds MAOE bridge rows:

text

experiments/agp_mlx/asr_bridge/build_paper4_matrix_bridge.py

Then the system runs:

text

evaluate_expert_router_v1.py
evaluate_bridge_policy_v1.py --oracle-proposal --rust-control-plane

This measures the upper-bound guardrail: if the correct answer were proposed, would the gate accept it safely?

Third, actual proposal evaluation. AGP or partition-specific adapters generate correction proposals, Rust gates them, and the report tracks:

text

CER before
CER after
accepted improved
accepted neutral
accepted worse
rejected would improve
rejected safe
partition-level acceptance
latency
retrieval usage
admissibility token coverage

The system becomes paper-ready only if it improves CER while preserving refusal quality.

7. Difference From Conventional Architectures

MAOE-N'Ko differs from a standard ASR-plus-language-model pipeline because the language model is not an unconditional postprocessor. It is a bounded proposal expert.

It differs from a conventional MoE because expert selection is not just capacity allocation. Expert selection is authority allocation.

It differs from full audio-language fusion because it keeps the acoustic model as an auditable anchor. Fusion is postponed until the modular ablations justify it.

It differs from generic retrieval-augmented generation because retrieval cannot directly rewrite the transcript. Retrieval can inform proposals or provenance, but the Rust gate decides.

It differs from accelerator-first architecture because TurboQuant and ANE are assigned specific roles. TurboQuant compresses retrieval and transfer. ANE, once verified, runs small routing heads. Neither is used as a decorative performance claim.

8. Claim Boundary

The current system proves architecture materialization, not final empirical superiority. It has:

text

partition policy
expert router
bridge schema
AGP proposal adapter
Rust control plane
admissibility witness
TurboQuant sidecar
ANE/Core ML research lane
Paper 4 converter path
smoke reports

It does not yet prove that MAOE-N'Ko beats the verified ASR checkpoint on the full same-snapshot test set. That is the next experiment.

The paper claim becomes strong if the following hold:

text

CER after routing < CER before routing
accepted_worse is zero or statistically negligible
stable refusal remains high
novelty override remains blocked
partition-specific experts beat a generic correction adapter
TurboQuant improves retrieval memory/latency without losing correction quality
Core ML / ANE route heads match CPU/MLX parity before any acceleration claim

9. Conclusion

MAOE-N'Ko is best understood as an admissible speech correction architecture for a culturally significant phonetic script. It uses AGP not as a replacement for ASR, but as a routed corrective prior. It uses TurboQuant not as a model, but as compressed retrieval and transfer infrastructure. It uses Rust not as glue, but as the authority boundary. It treats ANE not as magic acceleration, but as a future reflex engine for small verified heads.

The central idea is simple but strict: language priors should help N'Ko ASR only when the trajectory state says help is allowed, only in the form the partition permits, and only if an external gate can explain why the correction was admissible.

That is the difference between a model that writes plausible N'Ko and a system that respects what was actually spoken.

Promotion Decision

Convert into the standard paper schema, add citations, and render a draft PDF.

Source Anchor

MAOE-NKo-Technical-Architecture.md

Detected Structure

Abstract · Method · Evaluation · References · Code Anchors · Architecture