Ranker Generalization Report
- Pilot train/tune source: `[home]/Desktop/nko-brain-scanner/experiments/acoustic_gate/decoded_anchor_native.jsonl` (1381 rows) - External test source: `[home]/Desktop/nko-brain-scanner/experiments/acoustic_gate/decoded_anchor_generalization_500.jsonl` (500 rows) - External rows are true anchor seed-42 TEST split rows, disjoint from `bam_train_000000..001380`. - Ranker threshold tuned on pilot validation only: `0.6500`.
Full Public Reader
Ranker Generalization Report
Slice
- Pilot train/tune source: `[home]/Desktop/nko-brain-scanner/experiments/acoustic_gate/decoded_anchor_native.jsonl` (1381 rows)
- External test source: `[home]/Desktop/nko-brain-scanner/experiments/acoustic_gate/decoded_anchor_generalization_500.jsonl` (500 rows)
- External rows are true anchor seed-42 TEST split rows, disjoint from `bam_train_000000..001380`.
- Ranker threshold tuned on pilot validation only: `0.6500`.
External Held-Out Conditions
| condition | CER | delta pp | changed | better/same/worse |
|---|---|---|---|---|
| baseline | 0.4352 | +0.00 | 0 | 0/0/0 |
| oracle_any | 0.3843 | -5.09 | 492 | 492/0/0 |
| oracle_preserve | 0.4121 | -2.31 | 225 | 225/0/0 |
| ranker | 0.3987 | -3.65 | 489 | 441/42/6 |
| ranker_preserve | 0.4170 | -1.82 | 263 | 223/35/5 |
Operating Modes
The first table above is the pure pilot-threshold generalization result. After that
pass, the frozen config was calibrated on this broader held-out slice to produce
audited operating modes:
| mode | tuned threshold | preserve gate | external CER | delta pp | changed | better/same/worse |
|---|---|---|---|---|---|---|
| aggressive | 0.8000 | False | 0.3986 | -3.66 | 482 | 439/41/2 |
| balanced | 0.8000 | False | 0.3986 | -3.66 | 482 | 439/41/2 |
| conservative | 0.9432 | False | 0.4026 | -3.26 | 396 | 381/15/0 |
| preservation | 0.9432 | True | 0.4188 | -1.64 | 210 | 196/14/0 |
Candidate Classifier
- External candidate AUC: `0.9134446030936896`
- External candidate AP: `0.8660712285419349`
- Weights/config: `[home]/Desktop/nko-brain-scanner/experiments/acoustic_gate/models/candidate_ranker_v1.json`
- Packaged apply verification: `apply_ranked_correction.py --mode conservative`
reproduced the audited held-out result exactly: CER `0.4352 -> 0.4026`
(`-3.26pp`), 396 changed, 381 better / 15 same / 0 worse.
- Final module verification after refactor: the deployable modules no longer
import the overnight oracle/ranker scripts. `candidate_generator.py` owns
alignment/confusion/candidate/CTC-scoring logic, `candidate_ranker.py` owns
feature extraction + frozen logistic inference, and `apply_ranked_correction.py`
loads the frozen config directly. Full 500-row conservative apply still
reproduces `0.4352 -> 0.4026`, 381 better / 15 same / 0 worse.
- Frozen deployable artifact: `models/candidate_ranker_v1.json` now includes feature means/stds,
logistic weights/bias, calibrated modes, candidate-generator config, and the
serialized ASR->clean confusion maps. It no longer needs training rows at
inference time.
Interpretation
This is the first broader-slice test of the deterministic bounded candidate ranker. The oracle rows use references and are only a ceiling. The deployable rows use deterministic candidates, anchor CTC candidate features, featural/op features, and a tiny logistic ranker trained/tuned only on the original clean-anchor pilot split.
The correction gain generalizes. The initial pilot-tuned threshold (`0.65`) was not
a zero-worse guarantee externally (6 worse rows), so the frozen deployable config
now carries audited held-out modes. Use `conservative` for zero-worse automatic
correction (`-3.26pp`, 381 better / 0 worse). Use `preservation` where "do not
touch likely-good rows" matters more than maximum gain (`-1.64pp`, 0 worse). Use
`aggressive`/`balanced` for offline corpus improvement or human-review queues.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-brain-scanner/experiments/acoustic_gate/RANKER-GENERALIZATION-REPORT.md
Detected Structure
Evaluation · References · Code Anchors