Grand Diomande Research · Full HTML Reader

Stage 3 Report — Bounded Edit-Op Corrector

**Status:** done. The edit-op interface is valid and much faster than full-string correction, but the trained proposer collapsed to COPY and does **not** improve clean-anchor CER.

Language as Infrastructure experiment experiment writeup candidate score 24 .md

Full Public Reader

Stage 3 Report — Bounded Edit-Op Corrector

Status: done. The edit-op interface is valid and much faster than full-string correction, but
the trained proposer collapsed to COPY and does not improve clean-anchor CER.

Headline

Full-string correction was the wrong serving shape: Codex's clean LoRA trained, but 1,381-row
generation was stopped after 39+ minutes with no completed output. Stage 3 replaced that with bounded
edit operations (`COPY`, `SUB`, `INS`, `DEL`) scored by `featural_edit.py` and applied
deterministically by `edit_ops.py`.

The interface itself works only with a constrained opening-bracket decode prompt:

text
ASR: <candidate>
OPS: [

The generated tail is parsed as `[` + tail. Without this prefix, Gemma drifts into N'Ko phrase loops
instead of JSON. With the prefix, the full 1,381-row generation is valid and bounded, but every output
is `COPY`.

Dataset

Builder: `build_editop_sft.py`

Source: `experiments/acoustic_gate/datasets/clean_proposer_sft_v1`

Final output: `overnight/editop_sft_v2/`

metricvalue
rows in86,990
rows kept86,043
rows dropped, over 3 ops947
median edit ops0
max edit ops observed3
median target chars10
max target chars50

Operation counts:

opcount
COPY51,031
SUB25,939
DEL9,837
INS8,414

Latency target passed at the data level: targets are a few ops, not full transcripts.

Training

v1: verbose prompt

Adapter: `[home]/agp_pilot/adapter_editop_v1`

Full 600-iter training completed. Best validation was at iter 200:

checkpointval loss
iter 13.763
iter 1000.673
iter 2000.605
iter 3000.620
iter 4000.631
iter 5000.693
iter 6000.625

Smoke generation failed schema: 3/20 valid JSON scripts, 17/20 invalid, all valid scripts were
`COPY`. Raw generations were N'Ko text, not edit syntax. Diagnosis: the verbose prompt produced
max-sequence truncation warnings, so the model likely did not reliably see the JSON target.

v2: compact prompt

Adapter: `[home]/agp_pilot/adapter_editop_v2`

Prompt: `ASR: <candidate>\nOPS: `

Full 600-iter training completed:

checkpointval loss
iter 13.417
iter 1001.143
iter 2001.064
iter 3001.085
iter 4001.066
iter 5001.064
iter 6001.032

Peak memory: 4.558 GB.

Plain compact generation still failed schema: 1/20 valid JSON scripts, 19/20 invalid N'Ko loops.

Constrained-prefix generation fixed schema:

evalvalid JSONinvalidmedian sec/rowchanged
20-row smoke, `compact_open`20/2000.3190
full 1,381, `compact_open`1,381/1,38100.4930

Full generation report:

  • `accepted_by_cost`: 1,381
  • `copy`: 1,381
  • `changed`: 0
  • `max_featural_cost`: 0.0
  • output report: `overnight/editop_anchor_eval_v2/generation_report.json`

Clean-Anchor Robust Eval

Robust eval:

bash
KMP_DUPLICATE_LIB_OK=TRUE python3 experiments/acoustic_gate/robust_eval_anchor_clean.py \
  --raw experiments/acoustic_gate/overnight/editop_anchor_eval_v2/proposals_editop_v1.jsonl \
  --clean experiments/acoustic_gate/overnight/editop_anchor_eval_v2/proposals_editop_v1.jsonl \
  --output experiments/acoustic_gate/overnight/editop_anchor_eval_v2/clean_anchor_eval_editop_v2_report.json

Result:

conditionCERdeltaacceptedbetter/same/worse
ASR baseline0.3514---
raw + acoustic gate0.3514+0.00pp00/0/0
clean + acoustic gate0.3514+0.00pp00/0/0
clean + preserve + acoustic0.3514+0.00pp00/0/0

Proposal diagnostics:

  • raw: 0 changed / 0 better / 1,381 same / 0 worse
  • clean: 0 changed / 0 better / 1,381 same / 0 worse
  • preservation AUC remains 0.7397 on clean anchor labels.

Interpretation

This is a useful negative result, not a win. Bounded edit operations solve the 39+ minute full-string
latency failure as an interface, and constrained-prefix decoding gives strict parseability. But
the Gemma LoRA learned the safe prior too strongly and emits universal `COPY`, so it cannot close the
correction loop.

The next correction architecture should not ask Gemma to freely serialize both decision and edit.
Use a smaller structured head or two-step system instead:

1. classify `COPY` vs `EDIT` with calibrated confidence;
2. only when `EDIT`, predict a bounded op from a constrained candidate set;
3. score candidates by featural cost plus anchor acoustic support;
4. keep Gemma out of live token-by-token generation unless its output space is grammar-constrained.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-brain-scanner/experiments/acoustic_gate/overnight/STAGE3-REPORT.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture