Stage 3 Report — Bounded Edit-Op Corrector
**Status:** done. The edit-op interface is valid and much faster than full-string correction, but the trained proposer collapsed to COPY and does **not** improve clean-anchor CER.
Full Public Reader
Stage 3 Report — Bounded Edit-Op Corrector
Status: done. The edit-op interface is valid and much faster than full-string correction, but
the trained proposer collapsed to COPY and does not improve clean-anchor CER.
Headline
Full-string correction was the wrong serving shape: Codex's clean LoRA trained, but 1,381-row
generation was stopped after 39+ minutes with no completed output. Stage 3 replaced that with bounded
edit operations (`COPY`, `SUB`, `INS`, `DEL`) scored by `featural_edit.py` and applied
deterministically by `edit_ops.py`.
The interface itself works only with a constrained opening-bracket decode prompt:
ASR: <candidate>
OPS: [The generated tail is parsed as `[` + tail. Without this prefix, Gemma drifts into N'Ko phrase loops
instead of JSON. With the prefix, the full 1,381-row generation is valid and bounded, but every output
is `COPY`.
Dataset
Builder: `build_editop_sft.py`
Source: `experiments/acoustic_gate/datasets/clean_proposer_sft_v1`
Final output: `overnight/editop_sft_v2/`
| metric | value |
|---|---|
| rows in | 86,990 |
| rows kept | 86,043 |
| rows dropped, over 3 ops | 947 |
| median edit ops | 0 |
| max edit ops observed | 3 |
| median target chars | 10 |
| max target chars | 50 |
Operation counts:
| op | count |
|---|---|
| COPY | 51,031 |
| SUB | 25,939 |
| DEL | 9,837 |
| INS | 8,414 |
Latency target passed at the data level: targets are a few ops, not full transcripts.
Training
v1: verbose prompt
Adapter: `[home]/agp_pilot/adapter_editop_v1`
Full 600-iter training completed. Best validation was at iter 200:
| checkpoint | val loss |
|---|---|
| iter 1 | 3.763 |
| iter 100 | 0.673 |
| iter 200 | 0.605 |
| iter 300 | 0.620 |
| iter 400 | 0.631 |
| iter 500 | 0.693 |
| iter 600 | 0.625 |
Smoke generation failed schema: 3/20 valid JSON scripts, 17/20 invalid, all valid scripts were
`COPY`. Raw generations were N'Ko text, not edit syntax. Diagnosis: the verbose prompt produced
max-sequence truncation warnings, so the model likely did not reliably see the JSON target.
v2: compact prompt
Adapter: `[home]/agp_pilot/adapter_editop_v2`
Prompt: `ASR: <candidate>\nOPS: `
Full 600-iter training completed:
| checkpoint | val loss |
|---|---|
| iter 1 | 3.417 |
| iter 100 | 1.143 |
| iter 200 | 1.064 |
| iter 300 | 1.085 |
| iter 400 | 1.066 |
| iter 500 | 1.064 |
| iter 600 | 1.032 |
Peak memory: 4.558 GB.
Plain compact generation still failed schema: 1/20 valid JSON scripts, 19/20 invalid N'Ko loops.
Constrained-prefix generation fixed schema:
| eval | valid JSON | invalid | median sec/row | changed |
|---|---|---|---|---|
| 20-row smoke, `compact_open` | 20/20 | 0 | 0.319 | 0 |
| full 1,381, `compact_open` | 1,381/1,381 | 0 | 0.493 | 0 |
Full generation report:
- `accepted_by_cost`: 1,381
- `copy`: 1,381
- `changed`: 0
- `max_featural_cost`: 0.0
- output report: `overnight/editop_anchor_eval_v2/generation_report.json`
Clean-Anchor Robust Eval
Robust eval:
KMP_DUPLICATE_LIB_OK=TRUE python3 experiments/acoustic_gate/robust_eval_anchor_clean.py \
--raw experiments/acoustic_gate/overnight/editop_anchor_eval_v2/proposals_editop_v1.jsonl \
--clean experiments/acoustic_gate/overnight/editop_anchor_eval_v2/proposals_editop_v1.jsonl \
--output experiments/acoustic_gate/overnight/editop_anchor_eval_v2/clean_anchor_eval_editop_v2_report.jsonResult:
| condition | CER | delta | accepted | better/same/worse |
|---|---|---|---|---|
| ASR baseline | 0.3514 | - | - | - |
| raw + acoustic gate | 0.3514 | +0.00pp | 0 | 0/0/0 |
| clean + acoustic gate | 0.3514 | +0.00pp | 0 | 0/0/0 |
| clean + preserve + acoustic | 0.3514 | +0.00pp | 0 | 0/0/0 |
Proposal diagnostics:
- raw: 0 changed / 0 better / 1,381 same / 0 worse
- clean: 0 changed / 0 better / 1,381 same / 0 worse
- preservation AUC remains 0.7397 on clean anchor labels.
Interpretation
This is a useful negative result, not a win. Bounded edit operations solve the 39+ minute full-string
latency failure as an interface, and constrained-prefix decoding gives strict parseability. But
the Gemma LoRA learned the safe prior too strongly and emits universal `COPY`, so it cannot close the
correction loop.
The next correction architecture should not ask Gemma to freely serialize both decision and edit.
Use a smaller structured head or two-step system instead:
1. classify `COPY` vs `EDIT` with calibrated confidence;
2. only when `EDIT`, predict a bounded op from a constrained candidate set;
3. score candidates by featural cost plus anchor acoustic support;
4. keep Gemma out of live token-by-token generation unless its output space is grammar-constrained.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-brain-scanner/experiments/acoustic_gate/overnight/STAGE3-REPORT.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture