Grand Diomande Research · Full HTML Reader

AGP-MLX N'Ko ASR Bridge

The bridge keeps the acoustic model authoritative. It classifies ASR chunks into anticipation partitions, allows AGP-style correction only for non-stable regimes, and rejects corrections that exceed a bounded edit budget.

Language as Infrastructure experiment experiment writeup candidate score 40 .md

Full Public Reader

AGP-MLX N'Ko ASR Bridge

This directory is the first executable bridge between the verified N'Ko ASR model and AGP-MLX.

The edit budget uses both a relative limit and a small absolute floor. This matters for short N'Ko chunks where one missing vowel can be a large relative edit but still a safe local repair.

Every correction decision also carries a provisional Graph Kernel-style admissibility witness. That gives each accepted or rejected AGP correction a slice id, graph snapshot hash, policy hash, and 32-hex token so the bridge can be audited before the live kernel service is wired in.

Release Gate

The canonical release decision for this bridge lives in:

- `docs/research/agp-nko-live-corrector-release-gate-v1.md`

Use that document as the authoritative `GO` / `NO-GO` bar for the learned live
N'Ko correction path. Large `oracle_guardrail` replay wins are research-valid,
but they do not satisfy the learned live ship gate by themselves.

Current Pieces

`partition_policy.py` maps ASR telemetry into `stable`, `boundary`, `uncertain`, `recovery`, and `novelty`.
`admissibility.py` issues local provisional Graph Kernel-shaped admissibility witnesses.
`context_adapters.py` adapts ASR correction decisions into existing RAG++ retrieval provenance shape.
`rust_control_plane.py` delegates accept/reject decisions to the Rust `cc-agp-bridge` crate.
`agp_runtime.py` resolves and health-checks the promoted AGP topology.
`agp_text_proposal.py` is the bounded text-proposal adapter for Gemma/MLX-style correction candidates.
`expert_router.py` maps anticipation partitions into Mixture of Anticipatory Orthogonal Experts lanes.
`evaluate_expert_router_v1.py` emits the pre-neural expert-routing report for a bridge JSONL file.
`build_paper4_matrix_bridge.py` converts current Paper 4 prediction/reference artifacts into bridge rows.
`run_paper4_maoe_replay.py` runs the Paper 4 converter, MAOE router, and oracle Rust guardrail in one command.
`run_paper4_matrix_batch_replay.py` replays every locally collected same-snapshot Paper 4 run in one pass.
`build_gemma_nko_correction_sft.py` converts bridge rows into Gemma correction-SFT examples.
`schema.py` defines the typed packet, correction decision, CER aggregation, and AGP prompt shape.
`evaluate_bridge_policy_v1.py` evaluates the bounded correction policy over JSONL rows.

Expected Row Shape

json

{
  "audio_id": "sample-001",
  "chunk_index": 0,
  "asr_candidate": "NKO_TEXT",
  "reference": "REFERENCE_NKO_TEXT",
  "proposed_text": "OPTIONAL_AGP_PROPOSAL",
  "trajectory_scalars": {
    "confidence": 0.7,
    "uncertainty": 0.2,
    "transition_pressure": 0.4,
    "recovery_margin": 0.1,
    "novelty": 0.0,
    "stability": 0.8
  }
}

Smoke Run

bash

python3 experiments/agp_mlx/asr_bridge/evaluate_bridge_policy_v1.py \
  --input experiments/agp_mlx/asr_bridge/fixtures/policy_smoke.jsonl \
  --output-dir experiments/agp_mlx/asr_bridge/reports/policy_smoke \
  --oracle-proposal

The `--oracle-proposal` flag uses references as proposed corrections only to test the guardrails. It is not a deployable correction source.

Run the same guardrail through the Rust control plane:

bash

python3 experiments/agp_mlx/asr_bridge/evaluate_bridge_policy_v1.py \
  --input experiments/agp_mlx/asr_bridge/fixtures/policy_smoke.jsonl \
  --output-dir experiments/agp_mlx/asr_bridge/reports/policy_smoke_rust_gate \
  --oracle-proposal \
  --rust-control-plane

Generate proposal rows without loading MLX:

bash

python3 experiments/agp_mlx/asr_bridge/agp_text_proposal.py \
  --input experiments/agp_mlx/asr_bridge/fixtures/policy_smoke.jsonl \
  --output /tmp/agp_text_proposals_dry_run.jsonl \
  --dry-run

Generate proposal rows from the canonical promoted corrective lane over `/resume`:

bash

python3 experiments/agp_mlx/asr_bridge/agp_text_proposal.py \
  --input experiments/agp_mlx/asr_bridge/fixtures/policy_smoke.jsonl \
  --output /tmp/agp_text_proposals_resume.jsonl \
  --promoted-corrective-resume

The older `/propose` text adapter is still available for smoke comparisons:

bash

python3 experiments/agp_mlx/asr_bridge/agp_text_proposal.py \
  --input experiments/agp_mlx/asr_bridge/fixtures/policy_smoke.jsonl \
  --output /tmp/agp_text_proposals_http.jsonl \
  --promoted-corrective-http

Admissibility Witnesses

The witness mirrors the Comp-Core Graph Kernel provenance contract:

`slice_id`
`graph_snapshot_hash`
`policy_id`
`policy_params_hash`
`admissibility_token`
`query_hash`
`decision_hash`

The token uses the `admissibility_token_v2_hmac` shape and is truncated to 32 hex characters to match the Graph Kernel convention. It is marked `local_provisional`; production should replace this local issuer with a real Graph Kernel slice export.

Mixture of Anticipatory Orthogonal Experts

The MoVE-style takeaway for this bridge is not emotional speech generation. It
is expert routing. For N'Ko, each anticipation partition becomes a different
authority lane:

text

stable    -> acoustic_preservation
boundary  -> boundary_completion
uncertain -> uncertain_repair
recovery  -> recovery_context
novelty   -> novelty_quarantine

Run the deterministic routing gate:

bash

python3 experiments/agp_mlx/asr_bridge/evaluate_expert_router_v1.py \
  --input experiments/agp_mlx/asr_bridge/fixtures/policy_smoke.jsonl \
  --output-dir experiments/agp_mlx/asr_bridge/reports/expert_router_smoke

This emits:

`expert_router_report.json`
`expert_routes.jsonl`

The report does not claim CER improvement. It proves that every ASR chunk has a
specific expert lane, compute budget, TurboQuant mode, accelerator status, and
safety contract before any neural correction proposal is allowed.

Once the same-snapshot A100 run emits paired predictions and references, build
the MAOE evaluation input with:

bash

python3 experiments/agp_mlx/asr_bridge/build_paper4_matrix_bridge.py \
  --predictions /path/to/test_predictions.jsonl \
  --references /path/to/test_references.jsonl \
  --output experiments/agp_mlx/asr_bridge/reports/paper4_matrix_bridge/paper4_bridge.jsonl

Then run both gates:

bash

python3 experiments/agp_mlx/asr_bridge/evaluate_expert_router_v1.py \
  --input experiments/agp_mlx/asr_bridge/reports/paper4_matrix_bridge/paper4_bridge.jsonl \
  --output-dir experiments/agp_mlx/asr_bridge/reports/paper4_matrix_expert_router

python3 experiments/agp_mlx/asr_bridge/evaluate_bridge_policy_v1.py \
  --input experiments/agp_mlx/asr_bridge/reports/paper4_matrix_bridge/paper4_bridge.jsonl \
  --output-dir experiments/agp_mlx/asr_bridge/reports/paper4_matrix_oracle_guardrail \
  --oracle-proposal \
  --rust-control-plane

Or run the full replay chain in one command:

bash

python3 experiments/agp_mlx/asr_bridge/run_paper4_maoe_replay.py \
  --run-dir /path/to/completed/paper4/run \
  --output-dir experiments/agp_mlx/asr_bridge/reports/paper4_maoe_replay \
  --rust-control-plane

Run the learned live replay path instead of the oracle proposal path. The
canonical live mode is the packet-based `/resume` path:

bash

python3 experiments/agp_mlx/asr_bridge/run_paper4_learned_replay.py \
  --run-dir /path/to/completed/paper4/run \
  --output-dir experiments/agp_mlx/asr_bridge/reports/paper4_maoe_learned_replay \
  --proposal-mode promoted-corrective-resume \
  --rust-control-plane

The older `/propose` adapter path is still available with
`--proposal-mode promoted-corrective-http`.

If multiple same-snapshot runs have already been collected locally, replay the
whole completed matrix in one pass:

bash

python3 experiments/agp_mlx/asr_bridge/run_paper4_matrix_batch_replay.py \
  --results-root Desktop/nko-brain-scanner/results/paper4_same_snapshot_20260422_safe_lr1e4 \
  --output-root experiments/agp_mlx/asr_bridge/reports/paper4_same_snapshot_batch_replay \
  --rust-control-plane

Run the learned batch replay across the same local matrix:

bash

python3 experiments/agp_mlx/asr_bridge/run_paper4_matrix_batch_learned_replay.py \
  --results-root Desktop/nko-brain-scanner/results/paper4_same_snapshot_20260422_safe_lr1e4 \
  --output-root experiments/agp_mlx/asr_bridge/reports/paper4_same_snapshot_batch_learned_replay \
  --proposal-mode promoted-corrective-resume \
  --rust-control-plane

Build a Gemma correction adapter dataset from the same bridge rows:

bash

python3 experiments/agp_mlx/asr_bridge/build_gemma_nko_correction_sft.py \
  --input experiments/agp_mlx/asr_bridge/reports/paper4_matrix_bridge/paper4_bridge.jsonl \
  --output-dir experiments/agp_mlx/asr_bridge/reports/gemma_nko_correction_sft

The reference text is used as the teacher only after the same bounded policy
decides whether that correction would be admissible. If the oracle correction is
blocked, the target repeats the ASR candidate. That teaches Gemma both how to
correct and when not to correct.

This does not improve CER directly. It improves replayability and authority boundaries: if CER improves, we can trace which ASR chunk, trajectory scalars, partition, policy parameters, and evidence slice allowed the correction.

RAG++ Fit

RAG++ is the instruction/language-context layer, not the acoustic authority. The useful existing pieces are:

`rag_plusplus.slice.client.SliceExport`
`rag_plusplus.slice.enforcing_client.RetrievalProvenance`
`rag_plusplus.retrieval.provenance.SliceInfo`
`rag_plusplus.retrieval.provenance.RetrievalProvenance`

The bridge now emits `retrieval_provenance` beside `admissibility`, using the RAG++ provenance class when importable. This lets future live AGP calls carry the same chain of custody as normal RAG++ slice-scoped retrieval: query hash, slice id, policy ref, graph snapshot, admissibility token, and result hash.

Current local state: the RAG++ Python modules are importable from `core/retrieval/cc-rag-plus-plus`, but the service is not healthy on `:8000`. The PyO3 `admissibility-kernel-py` package exists, but its native extension is not built in this environment and `maturin` is not installed, so the bridge keeps the local provisional issuer until that build path is restored.

AGP Runtime Boundary

The promoted `9442` corrective lane is healthy. It still exposes `/resume` for encoded AGP hidden-state packets, and it now also exposes `/propose` for bounded text proposal smoke tests when the backbone model is loaded.

The text endpoint is an adapter surface, not the canonical AGP packet path. Early smoke tests show the current Gemma adapter is conservative/echo-biased for N'Ko correction prompts: it tends to repeat the ASR candidate instead of repairing short boundary errors. Rust correctly rejects unchanged proposals as `no_effect`.

With the few-shot N'Ko correction prompt in `agp_text_proposal.py`, the live Mac5 proposal path now matches the oracle guardrail smoke:

json

{
  "total": 4,
  "accepted": 2,
  "rejected": 2,
  "cer_before": 0.14285714285714285,
  "cer_after": 0.047619047619047616,
  "rag_plusplus_provenance": 4
}

The meaningful behavior is not just the CER delta. The model over-repaired the novelty row, and Rust blocked it with `novelty_partition_blocks_language_prior`. That is the architecture working as intended: neural proposal, symbolic/trajectory gate, admissibility record.

Synthetic Supervised Stress

`build_synthetic_correction_set.py` creates a small controlled correction set from real N'Ko validation text. This is not a substitute for same-provenance ASR heldout audio. It is a pressure test for policy shape: stable rows should preserve ASR, boundary/uncertain rows should permit bounded local repairs, and novelty rows should not let the language model override acoustic authority.

Latest live Mac5 `/propose` + Rust gate stress:

json

{
  "total": 16,
  "accepted": 8,
  "rejected": 8,
  "accepted_improved": 3,
  "accepted_neutral": 5,
  "accepted_worse": 0,
  "rejected_would_improve": 3,
  "rejected_safe": 5,
  "cer_before": 0.13333333333333333,
  "cer_after": 0.1
}

The additional supervised audit counters are important. Runtime admissibility does not see the reference, so it cannot know whether an accepted proposal improved CER. The report can know that afterward. In this stress run, the gate avoided harmful accepted edits, but it also accepted several neutral edits and blocked three novelty edits that would have matched the synthetic reference. That is an explicit tradeoff: the current policy favors acoustic authority over language-prior completion in novelty regimes.

For real CER improvement, the bridge still needs stronger proposal quality and a real heldout ASR error set:

tune the N'Ko correction prompt or adapter so accepted boundary/uncertain proposals are more often helpful than neutral
build the canonical ASR telemetry to AGP packet bridge that can call `/resume`
keep `cc-agp-bridge` as the final authority for accepting or rejecting proposals

Prediction-Only Probe and Hard Edit Cap

`build_prediction_probe.py` builds deployment probes from prediction-only ASR JSONL. These probes do not have gold references, so they cannot report CER. Their purpose is to test whether the proposal model tries to rewrite live ASR text and whether the Rust gate blocks unsafe edits.

The local Djoko prediction probe exposed an important failure mode: after prompt and token-budget changes, the promoted corrective lane stopped truncating as aggressively, but it still proposed multi-character rewrites on longer N'Ko strings. Those rewrites can be below the relative edit threshold while still being too large for a deployable ASR correction.

The bridge now applies a hard bounded-edit policy in both Python and Rust:

stable partitions preserve ASR
novelty partitions block language-prior overrides
one-character repairs can pass even when the relative ratio is high on short tokens
proposals above the absolute edit cap are rejected regardless of relative distance
proposals with more than one edit must also satisfy the relative edit threshold

Regression coverage lives in `core/semantic/cc-agp-bridge/src/lib.rs`, including a long-string case that rejects many small relative edits as `edit_too_large:absolute:4`.

Verified reports after the hardcap fix:

json

{
  "smoke": {
    "total": 4,
    "accepted": 2,
    "rejected": 2,
    "accepted_improved": 2,
    "accepted_worse": 0,
    "cer_before": 0.14285714285714285,
    "cer_after": 0.047619047619047616
  },
  "synthetic_stress": {
    "total": 16,
    "accepted": 8,
    "rejected": 8,
    "accepted_improved": 3,
    "accepted_neutral": 5,
    "accepted_worse": 0,
    "cer_before": 0.13333333333333333,
    "cer_after": 0.1
  },
  "prediction_only_probe": {
    "total": 16,
    "accepted": 0,
    "rejected": 16,
    "reference_rows": 0,
    "admissibility_tokens": 16,
    "rag_plusplus_provenance": 16
  }
}

This result should be read conservatively. The architecture is doing the right thing by refusing unsafe prediction-only rewrites, but the proposal lane is not yet proven as a CER-improving production corrector beyond the small reference-backed smoke and synthetic stress. The next real benchmark must use same-provenance ASR outputs with references.

Archived Reference-Backed ASR Eval Probe

`build_eval_results_bridge.py` converts `nko-brain-scanner/asr/eval-results-2026-03-20.json` into bridge packets. This artifact is an older ASR evaluation run, not the Paper 4 20.57

The file-head slice is mostly catastrophic ASR output. The hardcap gate rejects all proposals:

json

{
  "total": 12,
  "accepted": 0,
  "rejected": 12,
  "rejected_would_improve": 8,
  "cer_before": 1.7804878048780488,
  "cer_after": 1.7804878048780488
}

That is a useful negative result: the model can sometimes propose a reference-improving cleanup, but the edit is too broad to trust without a dedicated recovery policy.

The lower-CER archived slice gives a small positive result:

json

{
  "total": 5,
  "accepted": 1,
  "rejected": 4,
  "accepted_improved": 1,
  "accepted_worse": 0,
  "rejected_would_improve": 1,
  "cer_before": 0.7603686635944701,
  "cer_after": 0.7511520737327189
}

The accepted edit is a bounded one-character cleanup on an uncertain partition. This is the right current operating envelope: allow narrow local repairs, block broad language-model recovery until the proposal model is trained and evaluated for that regime.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/experiments/agp_mlx/asr_bridge/README.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture