Grand Diomande Research · Full HTML Reader

AGP N'Ko Live Corrector Release Gate V1

2.1.3 The evaluation corpus must contain real ASR predictions and real references from the same provenance as the deployment path.

Language as Infrastructure proposal experiment writeup candidate score 32 .md

Full Public Reader

AGP N'Ko Live Corrector Release Gate V1

Date: 2026-04-30

1.0 Purpose

1.1 Objective

1.1.1 This document defines the release gate for calling AGP a fully live learned N'Ko corrector at scale.

1.1.2 This gate exists to separate three different claims:

[ip] The AGP runtime architecture works.

[ip] The AGP guardrail policy works.

[ip] The learned live N'Ko correction path works in production conditions.

1.2 Non-Goal

1.2.1 This document does not define a new training plan.

1.2.2 This document does not treat oracle replay success as deployment success.

1.2.3 This document does not allow open-horizon corrective continuation as a valid ship target.

2.0 Required Definition of "Live Learned Corrector"

2.1 Minimum Meaning

2.1.1 The shipped path must be:

[ip] `ASR output -> learned AGP proposal -> Rust gate -> bounded accepted edit -> lower CER`

2.1.2 The proposal source must be the learned live AGP path, not an oracle reference substitution.

2.1.3 The evaluation corpus must contain real ASR predictions and real references from the same provenance as the deployment path.

2.2 Excluded Cases

2.2.1 `oracle_guardrail` replay alone does not satisfy this definition.

2.2.2 Smoke tests alone do not satisfy this definition.

2.2.3 Synthetic supervised stress alone does not satisfy this definition.

2.2.4 Prediction-only probes without references do not satisfy this definition.

3.0 Canonical Evidence Sources

3.1 Authoritative Runtime Artifacts

3.1.1 Cross-host packet replay:

[ip] `experiments/agp_mlx/runtime/reports/cross_host_packet_replay_v1/cross_host_packet_replay_report.json`

3.1.2 Bounded hybrid runtime:

[ip] `experiments/agp_mlx/runtime/reports/hybrid_runtime_v1_tb5_margin010_pair9430_9434_reentry_len8_pc075_fm010/hybrid_runtime_report.json`

3.1.3 Invalid open-horizon comparison:

[ip] `experiments/agp_mlx/runtime/reports/hybrid_runtime_v1_tb5_margin010_pair9430_9434_open_len8/hybrid_runtime_report.json`

3.2 Authoritative N'Ko Correction Artifacts

3.2.1 Live few-shot smoke and synthetic stress:

[ip] `experiments/agp_mlx/asr_bridge/README.md`

3.2.2 Large same-snapshot replay summary:

[ip] `experiments/agp_mlx/asr_bridge/reports/paper4_same_snapshot_batch_replay_final/batch_summary.md`

[ip] `experiments/agp_mlx/asr_bridge/reports/paper4_same_snapshot_batch_replay_final/*/replay_summary.json`

3.2.3 Post-TTT authoritative refresh command:

[ip] `python3 Desktop/nko-brain-scanner/scripts/paper4_post_ttt_refresh.py --wait`

3.3 Current Learned Adapter Status

3.3.1 Session state records for current learned correction branches:

[ip] `[home-path]`

[ip] The current reference checkpoint conclusion is that the repaired partial-real adapter branch remains worse than base and is not deployable.

4.0 Hard Release Criteria

4.1 Learned Path Requirement

4.1.1 The winning evaluation must use the learned live proposal path.

4.1.2 If the measured CER gain depends on oracle proposals, the result is research-valid but release-invalid.

4.2 Corpus Requirement

4.2.1 The release decision must use the five-run post-TTT same-provenance corpus.

4.2.2 The corpus must include the final N'Ko runs and their references.

4.2.3 If the five-run corpus is not refreshed, release status is automatically `NO-GO`.

4.3 CER Requirement

4.3.1 Aggregate mean CER delta across the authoritative five-run evaluation must be below `0`.

4.3.2 Every N'Ko heldout run must individually improve versus its own pre-correction CER.

4.3.3 Improvement must come from accepted learned edits, not from oracle substitution.

4.4 Safety Requirement

4.4.1 `accepted_worse` must equal `0` on the authoritative heldout evaluation.

4.4.2 `stable` partitions must remain effectively no-rewrite zones.

4.4.3 `novelty` partitions must remain effectively no-rewrite zones unless an explicitly reviewed recovery regime is added and separately validated.

4.4.4 Accepted improvements must come primarily from `boundary` and `uncertain` partitions.

4.5 Runtime Requirement

4.5.1 The deployed policy must remain bounded.

4.5.2 Open-horizon corrective continuation is not allowed.

4.5.3 The corrective live path must stay in the current tens-of-milliseconds latency class.

4.5.4 Mean corrective latency target is less than `60 ms`.

4.5.5 P95 corrective latency target is less than `150 ms`.

4.6 Stability Requirement

4.6.1 The corrective lane and gate must survive a 24-hour shadow run without supervisor churn, silent drift, or repeated crash-restart behavior.

4.6.2 Logs must show stable partition rates, stable acceptance rates, and no unexplained rise in rejected-safe or accepted-neutral counts.

5.0 Decision Protocol

5.1 GO

5.1.1 Release status is `GO` only if every item in Section 4 passes.

5.2 NO-GO

5.2.1 Release status is `NO-GO` if any of the following is true:

[ip] The learned live proposer does not beat base on the authoritative corpus.

[ip] `accepted_worse > 0`.

[ip] Stable or novelty partitions are rewriting at non-trivial rates.

[ip] The live path regresses into open-horizon corrective behavior.

[ip] Latency or uptime exits the allowed operating envelope.

5.3 Staging Rule

5.3.1 Deployment must progress in this order:

[ip] Offline heldout evaluation.

[ip] Shadow-mode live traffic with no writeback.

[ip] Bounded writeback only for approved partitions.

[ip] Gradual rollout with rollback switch and audit sampling.

6.0 Current Status on 2026-04-30

6.1 Proven Today

6.1.1 The AGP runtime architecture is proven enough to continue.

6.1.2 The cross-host hidden-state path preserved quality at `top1_match_rate=0.8571` with `mean_kl_to_teacher=0.7362`.

6.1.3 The bounded gate shape is validated.

6.1.4 The large same-snapshot replay shows mean CER delta `-0.004617` across four runs, with `accepted_worse=0` in the replay summaries.

6.2 Not Proven Today

6.2.1 The fully live learned N'Ko corrector at scale is not yet proven.

6.2.2 The strongest large replay result is still an `oracle_guardrail` result, not the final learned live proposal path.

6.2.3 The currently trained partial-real learned correction adapter remains worse than base on broader evaluation and is not a deployment candidate.

6.3 Current Decision

6.3.1 Current release status is `NO-GO`.

6.3.2 The next valid decision point is after the five-run post-TTT refresh and a learned-path heldout evaluation against that refreshed corpus.

7.0 Exit Condition

7.1 This document may be revised only when one of the following changes:

7.1.1 The authoritative corpus changes.

7.1.2 The live proposal model changes materially.

7.1.3 The Rust gate policy changes materially.

7.1.4 A dedicated recovery regime is added and validated.

7.2 If none of the above changed, this document remains the canonical release gate.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/docs/research/agp-nko-live-corrector-release-gate-v1.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture