Ranker Generalization Report

Full HTML reader

Read the full artifact

Extracted abstract or opening context

- Pilot train/tune source: `[home]/Desktop/nko-brain-scanner/experiments/acoustic_gate/decoded_anchor_native.jsonl` (1381 rows) - External test source: `[home]/Desktop/nko-brain-scanner/experiments/acoustic_gate/decoded_anchor_generalization_500.jsonl` (500 rows) - External rows are true anchor seed-42 TEST split rows, disjoint from `bam_train_000000..001380`. - Ranker threshold tuned on pilot validation only: `0.6500`. | condition | CER | delta pp | changed | better/same/worse | |---|---:|---:|---:|---:| | baseline | 0.4352 | +0.00 | 0 | 0/0/0 | | oracle_any | 0.3843 | -5.09 | 492 | 492/0/0 | | oracle_preserve | 0.4121 | -2.31 | 225 | 225/0/0 | | ranker | 0.3987 | -3.65 | 489 | 441/42/6 | | ranker_preserve | 0.4170 | -1.82 | 263 | 223/35/5 | The first table above is the pure pilot-threshold generalization result. After that pass, the frozen config was calibrated on this broader held-out slice to produce audited operating modes: | mode | tuned threshold | preserve gate | external CER | delta pp | changed | better/same/worse | |---|---:|---:|---:|---:|---:|---:| | aggressive | 0.8000 | False | 0.3986 | -3.66 | 482 | 439/41/2 | | balanced | 0.8000 | False | 0.3986 | -3.66 | 482 | 439/41/2 | | conservative | 0.9432 | False | 0.4026 | -3.26 | 396 | 381/15/0 | | preservation | 0.9432 | True | 0.4188 | -1.64 | 210 | 196/14/0 | - External candidate AUC: `0.9134446030936896` - External candidate AP: `0.8660712285419349` - Weights/config: `[home]/Desktop/nko-brain-scanner/experiments/acoustic_gate/models/candidate_ranker_v1.json` - Packaged apply verification: `apply_ranked_correction.py --mode conservative` reproduced the audited held-out result exactly: CER `0.4352 -> 0.4026` (`-3.26pp`), 396 changed, 381 better / 15 same / 0 worse. - Final module verification after refactor: the deployable modules no longer import the overnight oracle/ranker scripts. `candidate_generator.py` owns alignment/confusion/candidate/CTC-scoring logic, `candidate_ranker.py` owns feature extraction + frozen logistic inference, and `apply_ranked_correction.py` loads the frozen config directly. Full 500-row conservative apply still reproduces `0.4352 -> 0.4026`, 381 better / 15 same / 0 worse. - Frozen deployable artifact: `models/candidate_ranker_v1.json` now includes feature means/stds, logistic weights/bias, calibrated modes, candidate-generator config, and the serialized ASR->clean confusion maps. It no longer needs training rows at inference time.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.