Grand Diomande Research · Full HTML Reader

Runbook — Phase 1: regenerate proposals on the CLEAN anchor base (mac5)

**Why:** the acoustic-gate pilot's reference-dependent numbers (proposer hit rate 1.9%, flywheel harvest precision) were measured on the 297k model against **contaminated** ane references. To make them trustworthy, regenerate Gemma correction proposals against the **anchor's clean hypotheses + clean HF references**, then re-run the gate.

Language as Infrastructure experiment experiment writeup candidate score 24 .md

Full Public Reader

Runbook — Phase 1: regenerate proposals on the CLEAN anchor base (mac5)

Why: the acoustic-gate pilot's reference-dependent numbers (proposer hit rate 1.9
flywheel harvest precision) were measured on the 297k model against contaminated ane
references. To make them trustworthy, regenerate Gemma correction proposals against the
anchor's clean hypotheses + clean HF references, then re-run the gate.

Status: staged, gated on mac5 availability. Input is built. Not yet run.

Inputs (ready)

- `proposer_input_anchor.jsonl` (1,381 rows) — anchor clean hyps as `asr_candidate`,
clean HF refs as `reference`. Already compatible with
`ASRBridgePacket.from_mapping` (accepts `asr_candidate`/`reference`/`n_best`; trajectory
scalars default to neutral 0.0 — acceptable, or recompute the anchor's `scalar_computer`
outputs first for a fully self-consistent packet).
- Proposer: `Desktop/Comp-Core/experiments/agp_mlx/asr_bridge/agp_text_proposal.py`
(+ sibling `schema.py`). Lives on mac1 — must be copied to mac5 with its package dir.

Open decisions before running

1. Base vs LoRA. Task #10 trained a minimal-edit LoRA adapter (SFT from rejected
pairs). Decide: run base Gemma-3n-E2B-4bit (clean baseline) or the #10 adapter
(the intended corrector). Recommend running both to isolate the adapter's effect.
2. Gemma model path on mac5. Resolve the exact `--model` (Gemma-3n-E2B-4bit) +
optional `--adapter-path`. Use `[home-path]` (mlx_lm 0.31.2 — the
only venv that loads gemma-3n E2B; system py3.9/mlx 0.29.3 does NOT).

Command skeleton (run on mac5)

bash
# from mac1: ship proposer + input
rsync -a Desktop/Comp-Core/experiments/agp_mlx/ mac5:[home-path]
scp proposer_input_anchor.jsonl mac5:[home-path]

# on mac5
ssh mac5
source [home-path]
cd [home-path]
python agp_text_proposal.py \
  --input  proposer_input_anchor.jsonl \
  --output proposals_anchor_clean.jsonl \
  --model  <gemma-3n-E2B-4bit path> \
  [--adapter-path <#10 LoRA>] \
  --max-tokens 64
# (KMP_DUPLICATE_LIB_OK=TRUE not needed on mac5/MLX; that's a mac1/torch guard)

Post-run (back on mac1)

1. `scp mac5:[home-path] .`
2. `reextract.py` → `proposals_anchor_clean_extracted.jsonl` (harness fix, clean N'Ko).
3. Re-run the gate against clean refs: adapt `robust_eval.py` to point at
`decoded_anchor_native.jsonl` + the clean proposals → the trustworthy 4-condition
table (baseline / raw+gate / clean+gate / clean+preserve+gate) with bootstrap CIs.
4. Compare to the contaminated-substrate numbers in `TECHNICAL-REPORT.md §8`. The honest
question this answers: **does the corrector help at all once the references are clean
and the base is the 20.57

Verified context (do not re-derive)

- Anchor = `UnifiedCTCHead(num_classes=66, use_trajectory=True, use_tar=False,
use_ttt=False)`; native features at `/Volumes/HD1/anchor_bam_feats` (1500-frame).
- Clean refs: HF `Diomande/bambara-whisper-features/corrected_pairs_290k.jsonl`
(== `pairs.jsonl` for bam).
- Split is seed-42 deterministic; reconstruction verified (232476/29060/29060). The 1,381
pilot utts are NOT all train — they scatter ~80/10/10. No memorization (held-out CER
0.3112 vs train 0.3081).

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-brain-scanner/experiments/acoustic_gate/RUNBOOK-anchor-clean-regen.md

Detected Structure

Evaluation · References · Code Anchors