Grand Diomande Research · Full HTML Reader

Speech Calibration and Acoustic Improvement v0

The Speech Inscription Bridge v0 changed the failure mode. The harness no longer treats unstable CTC output as language. The next stage is calibration: collect short Malinke recordings, attach expected labels, sort the evidence by failure type, and build evaluation or training candidates without poisoning the corpus.

Language as Infrastructure experiment experiment writeup candidate score 48 .md

Full Public Reader

Speech Calibration and Acoustic Improvement v0

Status: Active next-stage spec

The Speech Inscription Bridge v0 changed the failure mode. The harness no longer treats unstable CTC output as language. The next stage is calibration: collect short Malinke recordings, attach expected labels, sort the evidence by failure type, and build evaluation or training candidates without poisoning the corpus.

The core invariant is still evidence first. A live packet is not a transcript. A user-supplied expected phrase is not automatically ground truth. A model output is never a label. The calibration compiler only admits a packet for acoustic training when the packet validates and the label is explicitly marked `human_verified`. Phrases supplied in the app text field or through `--nko-headless-expected-phrase` are `operator_expected`: they are useful for evaluation and triage, but they are not training labels until verified.

The current tool is:

text
experiments/acoustic_gate/build_speech_calibration_set_v0.py

It scans one or more copied `NKOLiveCalibration` roots, validates each `manifest.json` with the Speech Inscription validator, joins optional JSONL labels by `packetId`, `manifestSha256`, or `archiveRef`, and writes a replayable calibration index. The default input is the latest live proof copy:

text
/tmp/nko_speech_inscription_live_proof_current/NKOLiveCalibration

The output directory contains:

text
speech_calibration_index.json
speech_calibration_examples.jsonl
speech_calibration_buckets.json
expected_label_template.jsonl
buckets/<bucket>.jsonl

Buckets are derived from typed transcript decisions, not from visual inspection:

text
accepted_transcript -> accepted
rejected_overfire   -> overfire
rejected_low_audio  -> low_audio
rejected_non_nko    -> non_nko
rejected_unstable   -> unstable
needs_label         -> needs_label
invalid manifest    -> invalid_manifest

Each calibration example preserves the manifest path, packet directory, manifest hash, source kind, creation time, validation result, audio evidence records, replay requirements, transcript-decision statistics, expected-label claim, label comparison when meaningful, FAC target placeholders, tone-fusion readiness, and admissibility decision.

For label comparison, the calibration compiler treats the recognizer output as a hypothesis, not as truth. If a manifest contains `acceptedText`, CER is computed against that accepted transcript hypothesis. If the app correctly emits `needs_label` with `acceptedText=null` and a bounded `candidateText`, CER can still be computed against `candidateText` when an expected label is available. This makes phrase-ladder captures evaluation-ready without turning the candidate into a label. A candidate-only run can measure recognition error; it still cannot train the acoustic model unless a separate `human_verified` label row passes the promotion gate.

Expected labels use JSONL rows like:

json
{"packetId":"live-2026-06-04T135116-717Z","expectedText":"ߒ߬","language":"malinke","script":"nko","phraseId":"malinke-smoke-001","speakerId":"mohamed","labelStatus":"human_verified"}

FAC targets can be attached without claiming that the current model predicts them:

json
{"packetId":"live-1","expectedText":"ߒ߬","labelStatus":"human_verified","facTargets":{"tone":"low","nasality":"absent"}}

This lets FAC begin as scaffolded target fields. Real feature heads for voicing, place, manner, nasality, tone, length, vowel height, vowel backness, and rounding can later train against the same replayable packet archive. Until those heads exist, the examples say exactly that: targets may be supplied, but acoustic FAC prediction is not implemented.

The immediate calibration loop is:

text
record short Malinke phrase
  -> run ASR
  -> save governed live packet
  -> copy NKOLiveCalibration from the iPhone
  -> import app-container packet evidence into a local calibration root
  -> discover current copied or snapshotted evidence roots
  -> snapshot copied packet evidence to a durable archive
  -> build calibration set
  -> inspect buckets
  -> human-verify expected labels
  -> evaluate acoustic changes against the same packet hashes

Copied packet roots should not be treated as durable merely because they exist under `/tmp`. If packets already exist in the installed iPhone harness app container, import them with:

text
experiments/acoustic_gate/import_speech_calibration_device_packets_v0.py

The device packet importer uses the same `devicectl` app-data-container copy primitive as `run_speech_inscription_live_proof.py`, but it does not launch ASR. It copies `Documents/NKOLiveCalibration` into a local import root, validates the copied packet root, runs the evidence discovery indexer, and writes:

text
speech_calibration_device_packet_import.json
speech_calibration_evidence_discovery.json
speech_calibration_evidence_roots.jsonl

With `--snapshot-after-import`, the importer also archives the copied root through the normal snapshot command and records the `run_speech_calibration_snapshot_cycle_v0.py --snapshot-report` replay command. Without snapshotting, it records `run_speech_calibration_cycle_v0.py --packet-root` and `snapshot_speech_calibration_evidence_v0.py --packet-root` commands. This is a device-to-evidence bridge only. It does not launch ASR, create labels, accept transcripts, classify FAC/tone, or train.

Before deciding what to replay or archive, index current evidence with:

text
experiments/acoustic_gate/discover_speech_calibration_evidence_v0.py

The discovery command scans the default live-proof root, the phrase-ladder collection root, `Documents/NKOSpeechCalibrationEvidence`, and any additional `--search-root` path supplied by the operator. It finds loose `NKOLiveCalibration` roots, direct packet directories, and durable `speech_calibration_evidence_snapshot.json` reports. It writes:

text
speech_calibration_evidence_discovery.json
speech_calibration_evidence_roots.jsonl

Each discovered row reports packet counts, valid and invalid manifest counts, newest manifest time, packet ids, transcript-decision statuses, and ready-to-run commands. Loose packet roots get both `run_speech_calibration_cycle_v0.py --packet-root` and `snapshot_speech_calibration_evidence_v0.py --packet-root` commands. Durable snapshot reports get `run_speech_calibration_snapshot_cycle_v0.py --snapshot-report` commands and their archived `snapshotPacketRoot`. Discovery is evidence indexing only. It does not create labels, accept transcripts, classify FAC/tone, or train.

Once loose packet roots are identified, copy them into a timestamped archive with:

text
experiments/acoustic_gate/snapshot_speech_calibration_evidence_v0.py

The snapshot command scans one or more copied `NKOLiveCalibration` roots or direct packet directories, copies each packet directory into `Documents/NKOSpeechCalibrationEvidence/<snapshot>/NKOLiveCalibration`, validates the archived manifest, and writes:

text
speech_calibration_evidence_snapshot.json
speech_calibration_evidence_packets.jsonl

The snapshot report includes packet ids, source and archived paths, manifest hashes, validation status, copied byte counts, and a ready-to-run calibration-cycle command pointing at the durable packet root. It is archived evidence only. It does not label, promote, accept transcripts, classify tone/FAC, or train.

For replay, prefer the snapshot-report wrapper:

text
experiments/acoustic_gate/run_speech_calibration_snapshot_cycle_v0.py

The wrapper reads `speech_calibration_evidence_snapshot.json`, verifies that it is an evidence snapshot, uses the archived `snapshotPacketRoot`, and then runs the normal calibration cycle. It writes:

text
speech_calibration_snapshot_cycle_summary.json

This makes the replay root auditable from the snapshot metadata instead of a manually retyped `/tmp` path. Like the normal cycle, it does not create labels, accept transcripts, classify FAC/tone, or train.

Collection can be orchestrated with:

text
experiments/acoustic_gate/collect_speech_calibration_ladder_v0.py

It accepts either a JSONL phrase ladder or repeated inline phrases:

bash
python3 collect_speech_calibration_ladder_v0.py \
  --phrase 'malinke-smoke-001|<expected phrase you will say>' \
  --phrase 'malinke-smoke-002|<next expected phrase>' \
  --snapshot-after-collection

The collector launches the same headless iPhone proof path for each phrase, writes `operator_expected_labels.jsonl`, and then calls the calibration builder. With `--snapshot-after-collection`, it also archives the copied packet roots into the durable snapshot layout and records the replay command in `speech_calibration_ladder_summary.json`. Its labels are intentionally `operator_expected`, not `human_verified`. That makes a phrase-ladder collection evaluation-ready while preserving the training poison barrier.

After a phrase-ladder capture, `operator_expected_labels.jsonl` can be routed through `import_speech_calibration_operator_expected_verification_v0.py`. This command creates or reads `operator_expected_verification_decisions.tsv`, joins each row back to the captured packet manifest, rejects machine-hypothesis columns such as `candidateText`, rejects Latin or non-N'Ko expected text, requires an explicit human confirmation decision plus `verifiedBy` and `verificationBasis`, and writes `operator_expected_checked_labels_filled.jsonl` only for confirmed rows. The output is still a checked-label file, not a `human_verified` label file. It must pass the normal checked-label preflight and promotion gate before it can train anything. If the captured operator labels are missing, the manifest cannot be found, the manifest is invalid, the sheet is poisoned, or the human fields are incomplete, the importer writes a comment-only output and an `operator_expected_verification_import_report.json` with `readyForPreflight=false`. It does not infer speech, trust recognizer output, create `human_verified` labels, train, or translate.

Before any live capture, the collector now writes:

text
phrase_ladder_validation.json

This report is the pre-capture intake gate. Draft phrase ladders can still be loaded with generated ids and Latin text for planning, but actual capture should run with:

bash
python3 collect_speech_calibration_ladder_v0.py \
  --phrases phrase_ladder.jsonl \
  --require-capture-ready \
  --snapshot-after-collection

With `--require-capture-ready`, every phrase row must have an explicit stable `phraseId`, non-empty `expectedText`, and N'Ko-script expected text. Capture-ready expectedText must be written in N'Ko. Latin or English text can be notes or review context, but it is not accepted as the live N'Ko reference. The validator also rejects rows that carry recognizer output fields such as `candidateText`, `acceptedText`, `recognizerHypothesis`, `transcriptDecision`, or `labelComparison`. Those fields are machine hypotheses, not expected labels. It also rejects pre-filled promotion fields such as `confirmedSpoken`, `verifiedBy`, and `verificationBasis`; those belong only after a reviewer has listened to packet audio. This keeps the collection step from pretending that a planned phrase is already verified truth.

The same validation report governs FAC scaffold intake. `facTargets` may be attached to a phrase row, but only by using the known v0 slots: `voicing`, `place`, `manner`, `nasality`, `tone`, `length`, `vowelHeight`, `vowelBackness`, and `rounding`. In other words, capture-ready FAC targets are known FAC slots only. Capture-ready FAC targets are accepted only as `provided_v0` scaffold labels with non-empty values. Unknown slots are rejected, and proxy or trained statuses are rejected because those belong to downstream acoustic evidence, not to the phrase plan. This matters because FAC should become a trainable feature surface later, not an unbounded note field.

Checked labels are promoted through:

text
experiments/acoustic_gate/promote_speech_calibration_labels_v0.py

The promotion gate requires a valid packet join key, non-empty N'Ko expected text, `confirmedSpoken=true`, `verifiedBy`, and `verificationBasis`. It writes `human_verified` rows only after those checks pass. A `human_verified` expectedText must be written in N'Ko. Latin or English text belongs in notes or reviewer context, not the expected transcript field. Checked-label FAC targets are also validated here: unknown slots are rejected with `unknown_fac_target_slot`, fake downstream statuses are rejected with `fac_target_status_not_allowed`, and blank values are rejected with `fac_target_value_missing`. Valid checked-label FAC targets are canonicalized into `provided_v0` target objects before promotion. The calibration builder repeats this boundary: even if a hand-edited label file bypasses promotion and sets `labelStatus=human_verified`, non-N'Ko expected text is not evaluation-ready and is not training-admissible. The builder also requires `verifiedBy`, `verifiedAt`, and `verificationBasis` before a `human_verified` row becomes training-admissible. This means a hand-edited label file cannot become training truth merely by setting `labelStatus` to `human_verified`. If malformed FAC targets bypass promotion, the text label may still be acoustic-training-admissible, but FAC-head training remains blocked and the row records `facTargetIssues`.

Before promotion, checked-label files can be preflighted with:

text
experiments/acoustic_gate/preflight_speech_calibration_checked_labels_v0.py

The preflight report is written as:

text
checked_label_preflight.json

This is the reviewed-label intake guard. It reuses the same packet join validation, N'Ko-only expected text rule, confirmation rule, verifier metadata rule, manifest validation path, and FAC target validation as the promotion gate, but it does not write `human_verified` labels. It reports `readyForPromotion`, `promotableCount`, `rejectedCount`, `duplicateJoinKeyCount`, and `facTargetSlotCounts`. Promotable rows carry an `expectedTextFingerprint` rather than raw text in the summary, so the report can prove a checked row exists without turning the report itself into a transcript surface. A checked-label file with duplicate packet join keys is not ready for promotion, even if both rows would individually pass. This prevents two conflicting human labels from quietly entering the same packet evidence.

The preflight also preserves the central poison boundary: it does not trust `recognizerHypothesis`, does not write `human_verified` labels, and does not turn model output into ground truth. A recognizer hypothesis can remain reviewer context, but a row is not promotable unless the human-reviewed expected text is explicit, N'Ko-script, confirmed as spoken, and tied back to a valid packet manifest.

The review package also writes:

text
review_contract.json

This is the machine-readable worklist for label creation. It lists each packet's stable join keys, reviewable WAV references, required promotion fields, allowed FAC target slots, preflight/rerun command templates, and recognizer-hypothesis fingerprints. It deliberately fingerprints recognizer text instead of treating it as an expected transcript. The contract is a review instruction over evidence; it is not a transcript, not a translation, not a tone label, and not training data. Its purpose is to make the human-label step explicit enough that labels can be filled without reading raw HTML or guessing which metadata matters.

Minimal reviewer answers can be compiled with:

text
experiments/acoustic_gate/compile_speech_calibration_review_answers_v0.py

The compiler reads `review_contract.json` plus reviewer answer JSONL, rejects rows that carry recognizer hypothesis fields, rejects non-reviewable packets, requires N'Ko expected text plus `confirmedSpoken=true`, verifier, and verification basis for label rows, and writes checked-label JSONL plus `review_answer_compile_report.json`. It also normalizes checked-label FAC targets through the same canonical slot and `provided_v0` status rules used by the promotion gate. Reviewer rows may alternatively carry `reviewDisposition.status=not_labelable` with a reason such as `english_test`, `not_malinke`, `wrong_language`, `noise`, `clipped`, `too_short`, `silence`, `uncertain`, or `other`. Those disposition rows are recorded in `dispositionRows`, but they are skipped from checked-label output and cannot become `human_verified` training data. This still does not promote labels. A compiled label row is only ready for the existing preflight gate; it is not `human_verified`, not a transcript acceptance decision, and not training data.

Readiness is audited by:

text
experiments/acoustic_gate/audit_speech_calibration_goal_v0.py

The audit separates static tooling readiness from real evidence readiness. Static readiness means the builder, collector, promoter, and documentation are wired. Completion still requires valid replayable packets, at least one evaluation-ready expected label, and at least one training-admissible human-verified label. This prevents the project from calling the calibration stage complete just because the scaffolding exists.

The current operator-facing summary is prepared by:

text
experiments/acoustic_gate/prepare_speech_calibration_status_pack_v0.py

It writes:

text
speech_calibration_status_pack.json
speech_calibration_status_pack.md

The status pack reads the audit, phrase collection pack, FAC scaffold audit, failure queue, review-progress report, full review-audio verification report, review-priority report, focused handoff report, focused answer validation report, next-capture plan, handoff report, next-capture orchestration report, operator-expected verification report, and current calibration index when present. It chooses an explicit next action such as `fix_review_audio_evidence`, `prepare_focused_review_handoff`, `fill_focused_review_answers`, `fix_focused_review_answers`, `run_focused_postreview`, `refresh_review_progress`, `fix_review_answers`, `fix_checked_label_preflight`, `continue_review_or_capture_more_phrases`, `run_review_loop`, `fill_next_capture_phrases`, `fix_phrase_fac_scaffold`, `run_governed_handoff`, `run_capture_after_governed_handoff`, `verify_operator_expected_labels`, `preflight_and_promote_operator_checked_labels`, or `rerun_snapshot_cycle_and_evaluate`. Its summary includes `nextCommandKey` and `nextCommand`, so the operator-facing report carries the exact safe command for the current state rather than forcing the operator to infer one from a command list. Its command map exposes `prepareFocusedHandoff`, `serveFocusedReviewSheet`, `openFocusedReviewSheet`, `openFocusedDecisionTsv`, `openCleanReviewSheet`, `openLocalReviewServer`, `refreshReviewAudioVerification`, `refreshReviewPriority`, `refreshReviewProgress`, `runFocusedPostReview`, `runReviewLoop`, and `refreshPhraseEntryStatus`. `refreshReviewAudioVerification` checks the full clean review queue WAV references and writes `speech_calibration_review_audio_verification.json`; `refreshReviewPriority` writes the safe listening order to `speech_calibration_review_priority.json`, `speech_calibration_review_priority_rows.jsonl`, and `speech_calibration_review_priority.md`; `prepareFocusedHandoff` runs the short focused handoff gate and writes `focused_review_handoff.json`; `refreshReviewProgress` compiles and preflights filled review answers without promoting labels; `runFocusedPostReview` runs the focused post-review path only after focused answer validation has accepted the saved focused rows or safe non-label dispositions; `runReviewLoop` is only the post-review fail-closed path that can replay after checked labels are ready. If review audio exists but no focused handoff is ready, the status pack routes to `prepare_focused_review_handoff`; if the focused handoff is ready with pending rows and no accepted focused answer validation exists, it routes to `fill_focused_review_answers`; if `focused_review_answer_validation.json` says `ready_for_focused_compile` or `review_dispositions_only`, it routes to `run_focused_postreview`; if that validation report says `focused_audio_verification_failed`, `review_answers_invalid_json`, `review_answers_poisoned`, `focused_answers_mismatch`, or `focused_answers_invalid`, it routes back to `fix_focused_review_answers` through the focused localhost review sheet; if the focused decision TSV already has ready human-entered rows, it routes to `run_governed_handoff`. The summary now includes review audio queue row count, checked WAV count, expected WAV count, issue count, missing WAV count, hash mismatch count, review-priority status, ready row count, top-tier row count, duplicate row count, output unsafe-field count, focused handoff readiness, focused selected/pending/ready row counts, focused machine leak count, focused handoff FAC target slot counts, focused answer validation status, focused answer row count, focused label and disposition counts, focused answer leak and content issue counts, focused answer audit readiness, review answer row count, review hypothesis-leak count, compiled checked-label count, non-label disposition count, compile readiness, preflight readiness, and snapshot-rerun readiness alongside the FAC scaffold status, requested/provided slot counts, tone coverage request count, unknown FAC columns, and issue count. This is now part of the static readiness surface because progress has to be explainable from reports, not from memory or screenshots. The status pack does not listen to audio, infer speech, trust recognizer output, write expected labels, promote labels, capture audio, train, or translate. It exists to keep the operator from confusing a generated candidate, a green UI state, a focused handoff, a feature note, a review disposition, a playable HTML audio tag, a priority rank, a saved but invalid focused answer file, an FAC target count, or an old `/tmp` artifact with knowledge.

The portable focused packet also has a one-command TSV bridge, `run_speech_calibration_focused_packet_postreview_v0.py`, for the offline path where the human opens `focused_review_packet_sheet.html`, listens to the copied verified WAVs, downloads or edits `focused_review_answer_decisions.tsv`, and then returns that TSV to the governed pipeline. The runner reads `focused_review_packet.json`, imports the TSV through the existing focused answer importer, validates the imported JSONL through the existing focused answer validator, and runs `run_speech_calibration_focused_postreview_v0.py` only when validation reaches `ready_for_focused_compile` or `review_dispositions_only`. Its live status is currently `awaiting_packet_decisions`: the packet has three decision rows, three pending decisions, zero imported answers, zero label answers, zero disposition answers, zero invalid decisions, `readyForPostreview=false`, and `postreviewRan=false`. That is the intended fail-closed state until a person has listened and supplied N'Ko labels or non-label dispositions. The packet bridge does not listen, infer labels, trust recognizer output, create `human_verified` rows, validate FAC or tone proxies as truth, train, translate, or change the acoustic model.

Packet triage now has a consolidated failure queue:

text
experiments/acoustic_gate/prepare_speech_calibration_failure_queue_v0.py

This report reads built calibration examples, evidence discovery, CTC-overfire analysis, review-disposition analysis, and the phrase-entry refresh report, then writes `speech_calibration_failure_queue.json`, `speech_calibration_failure_queue_rows.jsonl`, and `speech_calibration_failure_queue.md`. It sorts rows into lanes such as `discovered_not_built`, `ctc_overfire_failure`, `unlabeled_failure`, `evaluation_only`, `training_admissible`, and `invalid_packet`, with next actions such as building the calibration set from discovered packets, prioritizing CTC overfire probes, recapturing short phrases, focused listening, operator-expected verification, or training export after human verification. The queue intentionally suppresses recognizer and label text fields including `candidateText`, `acceptedText`, `displayText`, `hypothesisText`, and `expectedText`; if those keys leak into the queue, the status becomes `failure_queue_text_leak_blocked`. The status pack now reads this queue and exposes failure-queue status, row count, bucket counts, lane counts, next-action counts, and text-leak count. The queue does not listen, infer labels, accept transcripts, promote labels, train, or translate. It is the bridge between raw packet collection and safe acoustic-improvement work: it tells the operator what kind of evidence exists without deciding what the speech says.

Short phrase entry can now be done through a local file-backed editor:

text
experiments/acoustic_gate/serve_speech_calibration_phrase_entry_v0.py

The server exposes a browser page, `/api/phrase-plan`, and `/api/refresh-status` over localhost. On startup it initializes or appends missing blank rows in `next_capture_phrase_plan.tsv` without overwriting existing human text. A save writes the governed TSV only after the candidate rows pass the same next-capture handoff inspection used by the import path. Latin or English `expectedText` is rejected before the TSV is touched, and unknown or machine-hypothesis fields such as `candidateText`, `recognizerHypothesis`, `acceptedText`, or `transcriptDecision` are rejected before save. The refresh endpoint runs the same no-capture phrase-entry status refresh described below, returning whether the current sheet is pending, invalid, or `ready_for_capture_without_recording` while preserving `captureRan=false` and `replayRan=false`.

The same local server now also exposes `/review/` as a safe static view of the existing `clean_review_sheet.html` and files under the active review directory. Path traversal is rejected so the server cannot become a general filesystem browser. The browser page includes an `Open Human Review` control for the clean review sheet and a `Refresh Review Progress` control backed by `/api/review-progress`. The clean review sheet can also post filled human rows to `/api/review-answers`, which recursively rejects recognizer-hypothesis fields before touching disk, validates the submitted rows through the existing review-answer compiler, and writes only `review_answers_filled.jsonl` if the rows compile into at least one checked N'Ko label or a valid non-label review disposition. That endpoint still does not promote labels to `human_verified`, does not train, does not translate, and does not accept recognizer output as truth. `/api/review-progress` first verifies the full `review_session_queue.jsonl` audio surface when it exists, writing `speech_calibration_review_audio_verification.json` with readable WAV, SHA-256, byte count, sample-rate, and duration checks for each captured/prepared audio reference. It then calls the existing review-progress monitor with compile and preflight enabled, so saved human answers can become checked-label rows and be preflighted from the same localhost surface only after the audio evidence gate is clean. The output remains a readiness report: `focused_audio_verification_failed`, `awaiting_review_answers`, `needs_answer_compile`, `needs_checked_label_preflight`, `checked_label_preflight_failed`, or `ready_for_snapshot_rerun`. Only the later replay command with checked labels may invoke the promotion gate, and only after checked rows pass the existing preflight rules.

The server can therefore reduce operator friction without weakening the poison boundary. It still does not create labels by itself, promote rows, capture audio, train, or translate. A saved N'Ko phrase plan must still pass governed handoff and explicit capture before it becomes packet evidence. A filled review answer must still pass compile, checked-label preflight, promotion, and replay before it becomes `human_verified` training data.

The phrase-entry plumbing can be rehearsed without touching the real phrase plan with:

text
experiments/acoustic_gate/rehearse_speech_calibration_phrase_entry_flow_v0.py

This command copies the current next-capture template into an isolated rehearsal directory, fills a rehearsal TSV with synthetic N'Ko text, and runs the normal handoff plus next-capture gates with live capture disabled. A successful rehearsal proves that the system would become `ready_for_capture` once real N'Ko expected text is present. It is not evidence of recognition quality and it is not a shortcut around human labels. The rehearsal report records the source TSV hash before and after the run and fails if the real `next_capture_phrase_plan.tsv` changed. Its output is explicitly `rehearsalOnly`, `realDataReady=false`, and `completionReady=false`.

Pre-capture FAC and tone coverage can be audited with:

text
experiments/acoustic_gate/audit_speech_calibration_phrase_fac_scaffold_v0.py

This command reads the next-capture phrase template and the editable phrase TSV, then writes `phrase_fac_scaffold_audit.json` and `phrase_fac_scaffold_rows.jsonl`. It reports which FAC slots were requested by the template, which optional FAC values were typed by the operator, which rows have N'Ko-ready phrase text, and whether any FAC values were attached to blank or invalid phrase rows. Unknown `fac*` columns fail closed as `fac_scaffold_invalid`; blank phrase rows remain `awaiting_nko_phrase_text`; ready phrase rows with no FAC values are only `fac_scaffold_requested`; and ready rows with provided values become `fac_scaffold_provided`. This is intentionally metadata auditability, not acoustic recognition. Provided FAC values are operator scaffolds only. They are not tone classifications, not FAC predictions, not evidence that the current model heard those features, not training labels, and not translation. The purpose is to let FAC/tone collection become inspectable before capture without letting feature notes or recognizer output masquerade as knowledge.

Acoustic changes are evaluated by:

text
experiments/acoustic_gate/evaluate_speech_calibration_v0.py

The evaluator reports valid packet coverage, failure buckets, label readiness, FAC target coverage, tone-fusion readiness, decode statistics, audio duration statistics, and label CER when comparable labels exist. When given a baseline examples file and a candidate examples file, it only compares rows matched by captured-audio SHA-256. This is the replayability rule: an acoustic improvement claim must be measured on the same evidence, not on a different recording.

The evaluator also reports whether comparable CER came from `acceptedText` or `candidateText`. This distinction is intentional. `acceptedText` is a transcript the harness accepted under its current gates. `candidateText` is a recognizer hypothesis preserved for calibration. Both can be compared to an expected phrase, but only the expected or verified label is a target. The model output remains evidence about the model, never ground truth.

Acoustic experiment comparisons are gated by:

text
experiments/acoustic_gate/compare_speech_calibration_experiments_v0.py

This command accepts either normal cycle summaries or snapshot-cycle summaries, resolves the underlying `speech_calibration_examples.jsonl`, and compares baseline versus candidate only by captured-audio SHA-256. It writes:

text
speech_calibration_experiment_comparison.json

The comparison reports `claimAdmissible=true` only when every baseline and candidate example matches by captured-audio SHA-256. Partial overlap and different evidence are reported explicitly and cannot support an acoustic improvement claim. This is the experiment discipline for later decoder, FAC, tone, or acoustic-model changes: changed results count only when the evidence stayed fixed. The command does not create labels, accept transcripts, classify FAC/tone, or train.

Training candidates are exported by:

text
experiments/acoustic_gate/export_speech_acoustic_training_manifest_v0.py

This command reads `speech_calibration_examples.jsonl` and trusts only the existing builder admissibility flags. Rows enter the acoustic training manifest only when `canTrainAcousticModel=true`. Rows enter the FAC-head training manifest only when `canTrainFACHeads=true`. It writes:

text
speech_acoustic_training_manifest.json
speech_acoustic_training_rows.jsonl
speech_fac_training_rows.jsonl
speech_training_excluded_rows.jsonl

The normal calibration cycle now writes this manifest on every replay. In the current empty or unverified state, the manifest is expected to be empty or exclusion-only. That is the desired failure mode: the system has a visible training surface, but it cannot train from operator-expected phrases, recognizer hypotheses, Latin labels, malformed FAC targets, or labels missing verification metadata. The command does not create labels, accept transcripts, classify FAC/tone, or train.

The full post-capture cycle is run by:

text
experiments/acoustic_gate/run_speech_calibration_cycle_v0.py

This command is the calibration lane's replay button. It validates an optional phrase plan, optionally promotes checked labels, rebuilds `speech_calibration_examples.jsonl`, refreshes `speech_calibration_evaluation.json`, `speech_fac_features.json`, `speech_fac_alignment_analysis.json`, `speech_tone_fusion_analysis.json`, `speech_ctc_overfire_analysis.json`, `speech_decoder_calibration_proposal.json`, `speech_decoder_calibration_simulation.json`, the human review package, and the goal audit. When no checked labels are provided, it deliberately runs the promotion gate against the empty checked-label template to produce a fail-closed `label_promotion_report.json`. The command can therefore finish successfully while the goal remains incomplete. That is intentional. A successful cycle means the evidence and reports were refreshed; it does not turn recognizer output into labels and does not imply recognition, FAC classification, tone classification, training readiness, or translation.

When checked labels are supplied to the cycle runner, the runner first writes `checked_label_preflight.json`. If the checked-label file is not ready for promotion, the cycle fails before promotion and before any `human_verified` label file is written. When no checked labels are supplied, the runner still preflights the generated review template after export, so the incomplete state is visible as a negative preflight report rather than an implicit absence of labels.

Replayable acoustic proxy evidence is extracted by:

text
experiments/acoustic_gate/extract_speech_fac_features_v0.py

This script reads `speech_calibration_examples.jsonl`, resolves hash-backed Float32 audio files from each packet, verifies the captured/prepared audio SHA-256 when present, and writes:

text
speech_fac_features.json
speech_fac_features.jsonl

The extractor computes deterministic acoustic measurements that can be audited without rerunning the full recognizer: duration, RMS, peak amplitude, zero-crossing rate, frame energy, autocorrelation voicing ratio, and an F0 proxy trace. The F0 trace can produce `observed_pitch_proxy_v0` tone evidence for calibration triage, and the voicing trace can produce `proxy_only_v0` voicing evidence. It also emits `music_note_proxy_v0`, a conventional equal-tempered note proxy that quantizes stable F0 frames against `A4=440Hz` and records fields such as `nearestMedianNote`, note histograms, and semitone span. This is useful for connecting speech pitch evidence to musical-note scaffolds, but it is explicitly not a speech tone classifier, not a transcript, and not an N'Ko music notation claim. These are not trained FAC heads. They are replayable acoustic facts attached to packet hashes. Place, manner, nasality, length, vowel height, vowel backness, and rounding remain `not_implemented_v0` until real heads exist.

The current real feature report is:

text
/tmp/nko_speech_calibration_v0_current/speech_fac_features.json

On the current copied evidence, the feature report contains hash-verified live mic feature rows plus missing-evidence rows from old invalid v1 manifests. The valid rows have acoustic proxy evidence, including pitch and note proxies when enough voiced frames exist, but they remain overfire failure packets with no expected labels. Therefore they can help characterize failure acoustics, but they still cannot train the recognizer and still cannot prove transcription accuracy.

FAC-target alignment is audited by:

text
experiments/acoustic_gate/analyze_speech_fac_alignment_v0.py

This script joins `speech_calibration_examples.jsonl` with `speech_fac_features.json` by captured-audio SHA-256, manifest SHA-256, or packet id. It writes:

text
speech_fac_alignment_analysis.json
speech_fac_alignment_analysis.jsonl

The report is the broad FAC readiness layer. For every FAC slot, it checks whether a valid packet has an evaluation-ready N'Ko text prior, a scaffolded `facTargets.<slot>` value with `provided_v0`, and an acoustic proxy or head output for that slot. In v0 only `voicing` and `tone` have proxy evidence. Place, manner, nasality, length, vowel height, vowel backness, and rounding remain `not_implemented_v0`, so a provided target for one of those slots is counted as useful supervision but not as acoustically aligned yet. This is not trained FAC classification, not transcript acceptance, not translation, and not training-data promotion.

The current real FAC-alignment report is:

text
/tmp/nko_speech_calibration_v0_current/speech_fac_alignment_analysis.json

On the current copied evidence, `acousticProxySlotCounts` shows voicing and tone proxies for the valid live packets, but `targetSlotCounts` and `proxyAlignmentReadySlotCounts` remain empty because no expected labels or FAC scaffold targets have been attached. This is the correct bridge state: the acoustic evidence is available for later FAC experiments, but the supervised side is still missing.

Tone-fusion readiness is audited by:

text
experiments/acoustic_gate/analyze_speech_tone_fusion_v0.py

This script joins `speech_calibration_examples.jsonl` with `speech_fac_features.json` by captured-audio SHA-256, manifest SHA-256, or packet id. It writes:

text
speech_tone_fusion_analysis.json
speech_tone_fusion_analysis.jsonl

The report is deliberately narrow. A row becomes `fusion_ready` only when the packet is valid, an evaluation-ready text prior exists, the `facTargets.tone` slot is present as `provided_v0`, and the acoustic feature row contains `observed_pitch_proxy_v0` tone evidence. It also carries `musicNoteEvidenceV0` as conventional pitch context, including the nearest median note when available. This is readiness only. It is not a tone classification, not transcript acceptance, not translation, and not a training-admissibility decision. A music-note proxy may help compare pitch movement across replayable packets, but it does not make the system understand Malinke tone.

The current real tone-fusion report is:

text
/tmp/nko_speech_calibration_v0_current/speech_tone_fusion_analysis.json

On the current copied evidence, the expected result is conservative: acoustic tone proxies and music-note proxies exist for the valid live packets, but `fusionReadyCount` remains `0` because no expected labels and no FAC tone targets have been attached. This is the honest state. The report proves that the acoustic side is available for later comparison; it does not prove that any output is correct.

The consolidated status path now surfaces this FAC/tone evidence directly instead of leaving it hidden in side reports. `speech_calibration_status_pack.json`, `speech_calibration_status_pack.md`, `phrase_entry_status_refresh.json`, and `phrase_entry_status_refresh.md` carry FAC feature extraction counts, voicing proxy counts, tone proxy counts, music-note proxy counts, FAC-alignment text-prior counts, tone-fusion target counts, tone-fusion ready counts, and tone-fusion status buckets. On the current live evidence this means `18` extracted feature rows, `18` voicing proxies, `18` tone proxies, `18` music-note proxies, `2` text-prior rows, and `0` fusion-ready rows because `facToneTargetCount` is still `0`. This is progress because the acoustic side is measurable and visible to the operator; it is also a boundary because visible acoustic proxies are not labels, not FAC classifications, not Malinke tone recognition, not transcript acceptance, not training data, and not translation.

The CTC overfire failure itself is analyzed by:

text
experiments/acoustic_gate/analyze_speech_ctc_overfire_v0.py

This script reads the same calibration examples, resolves `head_logits_f32.bin` and `head_argmax_i32.bin`, verifies the declared hashes, infers the CTC frame and class shape, and measures the nonblank/blank structure of the saved head evidence. It writes:

text
speech_ctc_overfire_analysis.json
speech_ctc_overfire_analysis.jsonl

The analyzer reports frame count, class count, blank index, argmax/logit agreement, raw nonblank frame ratio, collapsed CTC scalar count, unique nonblank class count, top nonblank class distribution, margin-to-blank statistics, blank-bias sweeps, and margin-gate sweeps. The sweep rows are decoder probes only. They can show whether a possible decoder threshold would suppress overfire, but they do not accept a transcript and they do not create labels.

The current real CTC report is:

text
/tmp/nko_speech_calibration_v0_current/speech_ctc_overfire_analysis.json

On the current copied evidence, the valid live packet has `375` CTC frames, `66` classes, blank index `65`, and `365` nonblank frames. That is a `0.9733` nonblank frame ratio, with `365` collapsed scalars and `45` unique nonblank classes. The saved argmax path matches the logits. A blank-bias or margin-threshold probe of `20.0` reduces the collapsed scalar count to `96`, but this remains a rejection/tuning signal, not a recognized utterance. The correct conclusion is that live ASR calibration must address severe CTC nonblank dominance before translation or transcript acceptance.

Safe decoder calibration proposals are generated by:

text
experiments/acoustic_gate/propose_speech_decoder_calibration_v0.py

This script consumes the CTC overfire report and the calibration evaluation report. It writes:

text
speech_decoder_calibration_proposal.json

The proposal layer exists because there are two different actions that must not be confused. A rejection gate can be used before labels exist, because it only prevents unstable output from being displayed, recycled, or promoted. A decoder probe cannot be used for transcript acceptance until same-evidence expected labels exist. In the current evidence state the proposal mode is `reject_only_unlabeled`. It permits a `ctc_overfire_nonblank_dominance` reject gate, but blocks `canUseDecoderProbeForTranscriptAcceptance` because the calibration set has no evaluation-ready labels and no training-admissible human-verified labels.

The current real decoder proposal is:

text
/tmp/nko_speech_calibration_v0_current/speech_decoder_calibration_proposal.json

For the valid live packet, the proposal says the reject gate would fire and the only available decoder experiment is a margin gate at `20.0`, producing `96` collapsed scalars in the probe. That number is not a transcript quality score. It is a bounded next experiment: once phrase-ladder labels exist, the same packet-hash machinery can test whether this threshold improves CER, merely hides errors, or destroys real speech.

The decoder proposal can be replayed without mutating the evidence by:

text
experiments/acoustic_gate/simulate_speech_decoder_calibration_v0.py

This writes:

text
speech_decoder_calibration_simulation.json
speech_decoder_calibration_simulated_examples.jsonl

The simulation is deliberately annotation-only. It reads `speech_calibration_examples.jsonl` and `speech_decoder_calibration_proposal.json`, attaches a `decoderCalibrationSimulation` object to each matching packet, and preserves the original transcript decision, bucket, accepted text, and expected-label state. A matching overfire packet may receive a candidate decision such as `rejected_overfire_suppressed_probe`, but that candidate decision is not promoted into `transcriptDecision`, does not set `acceptedText`, and does not create a label.

The current real simulation is:

text
/tmp/nko_speech_calibration_v0_current/speech_decoder_calibration_simulation.json

On the current copied evidence, the simulation covers three valid live packets. The reject gate would fire for all three, accepted transcript count remains `0`, label changes remain `0`, original decision changes remain `0`, and the only available bounded margin-gate probe reduces one packet's collapsed scalar count from `365` to `96`, a reduction of `269` scalars or about `73.7

Human review packages are exported by:

text
experiments/acoustic_gate/export_speech_calibration_review_package_v0.py

This script reads `speech_calibration_examples.jsonl`, verifies each reviewable packet's captured audio hash, converts captured and prepared Float32 audio into ordinary mono WAV files, and writes:

text
speech_calibration_review_package.json
review_queue.jsonl
checked_labels_template.jsonl
review_answers_template.jsonl
review_contract.json
review_sheet.html
audio/<packetId>/capturedAudio.wav
audio/<packetId>/preparedAudio.wav
README.md

The review package exists to close the gap between collected evidence and checked labels. A valid packet with replayable audio can become reviewable, but it still does not become a label. The `review_queue.jsonl`, `checked_labels_template.jsonl`, and local `review_sheet.html` rows include a `recognizerHypothesis` block when the recognizer produced `acceptedText` or `candidateText`. That block is reviewer context only. It can help a reviewer see what the model thought it heard, but it is not copied into `expectedText`, and the promotion gate ignores it as a label source.

For the safer compiler path, the package also writes `review_answers_template.jsonl`. That file contains the join keys, audio references, empty confirmation fields, optional operator-expected text when present, and no `recognizerHypothesis` block. Recognizer hypotheses are intentionally omitted from reviewer-answer rows. A reviewer can fill this answer template after listening, then run `compile_speech_calibration_review_answers_v0.py` against `review_contract.json`. The compiler rejects any answer row that reintroduces recognizer output fields before writing checked-label JSONL.

The `checked_labels_template.jsonl` rows default to `confirmedSpoken=false`, with empty verifier and verification-basis fields. The HTML sheet can play the exported WAVs and generate checked-label JSONL locally, but that output is still only a promotion input. It now emits only rows where the reviewer entered expected N'Ko text; unfilled packets are omitted so partial review sessions can still produce a strict preflight-ready file. If entered text is Latin or otherwise not detected as N'Ko, the row is marked invalid before download. The row must be filled after listening to the WAV evidence, and then `promote_speech_calibration_labels_v0.py` validates the packet manifest again before writing a `human_verified` label.

The current real review package is:

text
/tmp/nko_speech_calibration_review_v0_current/speech_calibration_review_package.json

On the current imported iPhone snapshot, nine valid packets are reviewable. The two old v1 manifests remain non-reviewable. This produces nine checked-label template rows, nine review-answer template rows, and eighteen WAV renderings for the valid packets. This still does not complete calibration because no one has confirmed what was spoken in those packets. It only makes the next human verification step concrete, auditable, and tied to packet hashes.

The review handoff can be condensed with:

text
experiments/acoustic_gate/prepare_speech_calibration_review_session_v0.py

This command reads `speech_calibration_review_package.json`, `review_contract.json`, and `review_answers_template.jsonl`, then writes:

text
speech_calibration_review_session.json
speech_calibration_review_brief.md
review_session_queue.jsonl
clean_review_sheet.html

The session report is the reviewer-facing operating layer. It reports `readyForHumanReview`, the current gate `needs_human_review_answers`, packet counts, pending answer counts, bucket counts, preseeded expected-label counts, unique review packet counts, duplicate review row counts, and the exact commands to compile reviewer answers, preflight the resulting checked labels, and rerun the calibration or snapshot cycle with `--checked-labels`. The queue prioritizes reviewable packet audio without copying raw recognizer output into the answer path. The current queue contract is intentionally stricter than the older package contract: it does not include `recognizerHypothesis`, `candidateText`, `acceptedText`, `answerSeed`, or `expectedText` fields. If a prior expected label exists, the queue records only `expectedLabelState.labelPresent=true` and a `labelFingerprint`; the actual expected label text is withheld from the queue and from the clean sheet input until the reviewer listens and types it. The queue now also records `duplicateState` as triage metadata. The first occurrence of a packet remains the primary review row, while later duplicate rows receive a priority penalty and are marked `skip_until_primary_reviewed`, so the human review surface starts with unique short evidence instead of asking the reviewer to process the same packet repeatedly. This removes bias paths where preseeded labels, duplicate rows, or recognizer-side context could quietly become reviewer output.

The same command also writes `clean_review_sheet.html`. This is the preferred label-entry surface. It plays the captured and prepared WAV files, lets the reviewer enter expected N'Ko text and verification metadata, records which WAV was actually used as `heardAudio`, displays only safe priority and duplicate metadata such as the primary/duplicate ordinal, and exports `review_answers_filled.jsonl`. `heardAudio` carries the selected `audioKey`, relative WAV path, SHA-256, and duration when present, so later checked labels and review dispositions are tied back to concrete audio evidence rather than only to a packet id. When served through the local phrase-entry server, the same sheet can save those rows directly through `/api/review-answers`; when opened as a standalone file, the download and copy paths remain available. The sheet also exposes non-label review dispositions for cases such as `english_test`, `not_malinke`, `wrong_language`, `noise`, `clipped`, `too_short`, `silence`, `uncertain`, or `other`. Those rows record that the audio was reviewed, but they intentionally omit `expectedText`, set `confirmedSpoken=false`, and do not carry FAC targets, so they can triage failures without creating checked labels. Unlike the original debug-oriented `review_sheet.html`, the clean sheet does not include raw recognizer hypotheses, `candidateText`, `acceptedText`, or transcript-decision text. It creates reviewer answer rows only. Those rows still have to pass `compile_speech_calibration_review_answers_v0.py`, checked-label preflight, promotion, and snapshot replay before they can affect evaluation or training.

The normal replay cycle now prepares this review session after exporting the review package. Snapshot replay refreshes it again after writing `speech_calibration_snapshot_cycle_summary.json`, so the rerun command can remain bound to the durable snapshot report. This makes the current human label task explicit without weakening the poison boundary. The session command does not create labels, accept transcripts, classify FAC/tone, or train.

For the current packet volume, the focused review batch is the better operating surface:

text
experiments/acoustic_gate/prepare_speech_calibration_focused_review_batch_v0.py

This command reads `speech_calibration_review_session.json` and `review_session_queue.jsonl`, then writes `focused_review_batch.json`, `focused_review_queue.jsonl`, `focused_review_answers_template.jsonl`, `focused_review_sheet.html`, and `focused_review_brief.md`. It selects a small pending batch rather than forcing the reviewer through the full clean sheet. The default policy prefers live microphone rows, covers priority buckets such as overfire and accepted output first, and then fills the remaining slots with the shortest replayable captured-audio packets. It also respects the full-session `duplicateState`: primary review rows are preferred, non-primary duplicate rows are excluded from the focused batch while primary rows exist, and duplicate rows are used only as a fallback for older or partial queues that contain no primary rows. The focused queue is now sanitized at write time: even if an older source queue still contains `recognizerHypothesis`, `candidateText`, `acceptedText`, `hypothesisText`, `answerSeed`, or raw expected-label text, the focused queue emits only safe review metadata, `reviewSeed`, audio references, duplicate state, and join keys. The focused report exposes `selectedPrimaryRowCount`, `selectedDuplicateRowCount`, and each selected row's safe duplicate metadata, so the batch remains auditable without exposing transcript text. When acoustic side reports are available, selection is also active-learning aware: the batch records `selectionEvidence`, `activeLearningScore`, `selectionReasons`, and `selectionEvidenceMatchedCount` from FAC proxy features, tone proxy evidence, music-note proxy evidence, and CTC overfire analysis. These fields are review triage signals only. They do not include transcript text, do not create labels, do not prove recognition, and do not make a packet training-admissible.

The focused brief is the compact handoff for the same batch. It lists the selected packet ids, buckets, audio durations, captured WAV references, evidence-aware selection reasons, and the exact focused review-loop command. Like the focused answer template, it avoids raw recognizer hypothesis fields and does not become label evidence. Its purpose is operational: reduce the chance that the reviewer fills the wrong file, listens to the wrong packet, or runs the full-session replay command instead of the focused one.

The normal focused-review setup path is now:

text
experiments/acoustic_gate/prepare_speech_calibration_focused_handoff_v0.py

This one-command preparation gate runs the focused batch selector, focused WAV verifier, heard-audio draft generator, listening-session builder, decision-sheet initializer, and handoff-sheet inspector in sequence. It writes `focused_review_handoff.json` and `focused_review_handoff.md`, and it exists specifically to avoid out-of-order handoff artifacts such as a header-only editable TSV produced before the listening queue exists. The command creates `focused_review_answer_decisions.tsv` from the just-written listening queue when the sheet is missing, replaces an empty/header-only sheet, and replaces a stale blank sheet whose packet ids no longer match the listening queue. It preserves any sheet that appears to contain human edits, even if the packet set is stale, so real labels or dispositions are not overwritten. Its ready state means the reviewer has a small verified listening task with blank human-entry fields; it does not mean the system has labels. It does not import answers, listen, infer expected text, trust recognizer output, promote labels, train, or translate.

The same handoff now writes an `actionPacket` and `sourceFingerprints` block. The action packet is the compact operator checklist: it reports the next safe action, pending row count, safe pending packet ids, review order, bucket, preferred verified-audio key, preferred audio SHA-256, focused-answer validation status, validation source hashes, audio checked/expected/issue counts, and only the commands that are safe in the current state. With empty or missing focused answers, the packet exposes `serveFocusedReviewSheet` and `refreshPhraseEntryStatus` only. `runFocusedPostReview` appears only when `focused_review_answer_validation.json` is already in a safe post-review state such as `ready_for_focused_compile` or `review_dispositions_only`. The packet intentionally omits `expectedText`, recognizer hypotheses, candidate text, accepted text, and translations. It is a handoff and provenance surface, not a label source or recognition claim.

The focused batch also exposes an acoustic-memory refresh command under `commands.buildAcousticMemory`:

text
experiments/acoustic_gate/build_speech_calibration_acoustic_memory_v0.py

This command reads `speech_calibration_examples.jsonl`, `speech_fac_features.json`, `speech_ctc_overfire_analysis.json`, `speech_tone_fusion_analysis.json`, and the focused batch report. It writes `speech_acoustic_memory_index.json`, `speech_acoustic_memory_vectors.jsonl`, and `speech_acoustic_memory_neighbors.jsonl`. The active backend is `json_cosine_v0`: every packet gets a small deterministic vector made from proxy acoustic values, tone/F0 evidence, music-note proxy span, and CTC overfire statistics, then nearest neighbors are computed for packet-similarity triage. The report explicitly records TurboVec as a future candidate backend, not as the recognizer. A later implementation can replace the JSON cosine backend with TurboVec `IdMapIndex` after a retrieval benchmark proves it improves review triage or candidate selection. The v0 command does not include transcript text, does not create labels, does not accept recognizer output, does not classify FAC/tone, does not train, and does not translate. Its value is memory: it lets the calibration system ask which prior packets have similar acoustic/decoder evidence without confusing similarity with correctness.

The focused batch also exposes `commands.evaluateAcousticMemoryRetrieval`:

text
experiments/acoustic_gate/evaluate_speech_acoustic_memory_retrieval_v0.py

This command reads `speech_acoustic_memory_index.json`, `speech_acoustic_memory_vectors.jsonl`, and `speech_acoustic_memory_neighbors.jsonl`, then writes `speech_acoustic_memory_retrieval_evaluation.json` and `speech_acoustic_memory_retrieval_evaluation.jsonl`. It evaluates retrieval without transcript text by measuring label-free organization signals such as `bucketTop1MatchRate`, `bucketAnyKMatchRate`, `sourceKindTop1MatchRate`, `bucketPairCounts`, `focusedQueryCount`, and safe label-state counts. These are not correctness metrics. They only say whether the acoustic-memory vectors cluster packets with similar failure buckets or source kinds well enough to help triage. The report explicitly marks `labelSensitiveEvaluationStatus=requires_human_verified_labels` until safe human labels exist. Once labels exist, label-sensitive comparison must still happen through a separate safe label join rather than by copying expected text into the retrieval artifact. This JSON-cosine report is the baseline any TurboVec `IdMapIndex` backend must beat on retrieval quality, safety, and latency before TurboVec belongs near the iPhone path. The command does not create labels, accept recognizer output, compare transcript text, train, or translate.

Focused review audio can be verified with:

text
experiments/acoustic_gate/verify_speech_calibration_focused_review_audio_v0.py

The focused batch exposes this as `verifyFocusedReviewAudio` and writes the report path `focused_review_audio_verification.json`. The verifier opens the selected captured and prepared WAV files, checks that each is readable mono 16-bit WAV evidence, compares exported WAV SHA-256 hashes, byte counts, sample rates, and durations against the focused queue metadata, and reports `readyForFocusedLabeling`. This is still an evidence gate only. It does not read recognizer hypotheses, does not create expected text, does not accept transcripts, does not classify FAC/tone, does not promote labels, does not train, and does not translate.

The focused sheet exports `focused_review_answers_filled.jsonl`, not the full-session `review_answers_filled.jsonl`. The focused answer template is the non-HTML equivalent: it contains join keys, audio references, empty `expectedText`, `confirmedSpoken=false`, verifier fields, an optional `heardAudio` object where the reviewer can name the captured or prepared WAV path, SHA-256, and duration they actually listened to, and a `reviewDisposition` placeholder for packets that should be reviewed but not labeled, but no recognizer hypothesis fields. A reviewer can copy and complete this template after listening instead of using the browser sheet. The focused batch also exposes `draftFocusedAnswers`, implemented by `draft_speech_calibration_focused_review_answers_v0.py`, which reads `focused_review_answers_template.jsonl` and `focused_review_audio_verification.json` and writes `focused_review_answers_draft.jsonl` plus `focused_review_answer_draft_report.json`. This draft path pre-fills only verified `heardAudio` metadata and an `afplay` listening command. It leaves `expectedText` empty unless the template already had operator-expected text, keeps `confirmedSpoken=false`, and does not infer speech. The batch also exposes `prepareListeningSession`, implemented by `prepare_speech_calibration_focused_listening_session_v0.py`, which reads `focused_review_batch.json`, `focused_review_audio_verification.json`, and the draft file, then writes `focused_review_listening_session.json`, `focused_review_listening_session.md`, and `focused_review_listening_queue.jsonl`. This listening-session artifact packages only verified WAV references, `afplay` commands, preferred `heardAudio` suggestions, and the two allowed reviewer outcomes: real Malinke label entry with N'Ko `expectedText`, or safe `reviewDisposition.status=not_labelable` with a reason. It does not listen, infer expected text, expose recognizer text, create labels, train, or translate. Its purpose is to make the human listening step smaller and less error-prone while preserving the rule that only a human can turn audio into a label or a non-labelable disposition.

The focused batch also exposes a TSV answer-entry bridge through `import_speech_calibration_focused_review_answers_v0.py`. The `prepareAnswerDecisionSheet` command writes `focused_review_answer_decisions_template.tsv` from `focused_review_listening_queue.jsonl` and initializes the editable `focused_review_answer_decisions.tsv` when that sheet is absent. It never overwrites an existing filled decision sheet. The reviewer fills only explicit human fields: `decision`, `expectedText` or `reason`, `verifiedBy`, and `verificationBasis`. The `importFocusedAnswerDecisionSheet` command then writes `focused_review_answers_filled.jsonl` and `focused_review_answer_import_report.json`. Label rows must use `decision=labelable_malinke` and real N'Ko `expectedText`; non-label rows must use `decision=not_labelable` and one known reason such as `english_test`, `not_malinke`, `noise`, `clipped`, `too_short`, `silence`, `wrong_language`, `uncertain`, or `other`. The importer reads join keys and verified `heardAudio` from the listening queue rather than from free text, rejects machine-output columns such as `candidateText` or `recognizerHypothesis`, rejects Latin expected text, clears the output to a comment-only fail-closed file on import failure, and still leaves validation, compile, preflight, promotion, and replay to the existing gates. It does not listen, infer expected text, expose recognizer text, promote labels, train, or translate.

When `heardAudio` is filled on a label row, the compiler normalizes it into the checked-label row, preflight reports it as part of the promotable row, and promotion preserves it in the `human_verified` label. When `reviewDisposition.status=not_labelable` is filled instead, the compiler records the packet as reviewed failure evidence and does not write a checked-label row for it. The batch report exposes `monitorFocusedReviewProgress` and `runFocusedReviewLoop`, which target `focused_checked_labels_filled.jsonl`, `focused_review_answer_compile_report.json`, `focused_checked_label_preflight.json`, `focused_review_progress.json`, and `focused_review_loop.json`. It also exposes `analyzeReviewDispositions`, implemented by `analyze_speech_review_dispositions_v0.py`, which reads `focused_review_answer_compile_report.json`, joins `dispositionRows` back to `speech_calibration_examples.jsonl` and `focused_review_audio_verification.json` when available, and writes `focused_review_disposition_analysis.json` plus `focused_review_disposition_analysis.jsonl`. The disposition analysis report sorts reviewed non-label packets by `reasonCounts`, `failureFamilyCounts`, `bucketCounts`, `bucketReasonCounts`, and `heardAudioVerificationStatusCounts`, but it still does not create labels, accept transcripts, promote rows, train, or translate. Both focused commands include `--audio-verification-report focused_review_audio_verification.json`, so the focused lane fails closed with `focused_audio_verification_failed` if the selected WAV evidence is missing, unreadable, hash-mismatched, or otherwise not `readyForFocusedLabeling`. The focused batch does not create labels, accept transcripts, expose recognizer text as expected text, classify FAC/tone, promote to `human_verified`, or train. It is only a smaller human-listening queue bound to the same verified-audio, compiler, preflight, promotion, disposition-analysis, and replay gates.

The focused batch also exposes `inspectHumanHandoffSheets`, implemented by `inspect_speech_calibration_handoff_sheets_v0.py`. This command reads the editable `focused_review_answer_decisions.tsv` and `next_capture_phrase_plan.tsv` sheets, writes `human_handoff_sheet_status.json`, and reports whether each sheet is missing, still pending, invalid, partially ready, or ready for import. It uses the same N'Ko-script, verifier-metadata, disposition-reason, packet-id, phrase-id, and machine-field poison checks as the importers, but it does not import rows or mutate downstream JSONL. Its purpose is to let the operator check the handoff before running `importFocusedAnswerDecisionSheet` or `importPhrasePlanSheet`. It does not listen, infer text, expose or trust recognizer output, create labels, promote labels, capture audio, train, or translate.

The focused batch now also exposes `runGovernedHandoff`, implemented by `run_speech_calibration_handoff_v0.py`. This is the normal one-command bridge after either editable TSV has been filled. It always runs the read-only handoff inspector first, blocks all mutating stages if any present sheet is invalid, imports only sheets with ready human-entered rows, runs focused post-review when focused answers become validation-ready, and runs the next-capture orchestration when planned phrases become capture-ready. A missing focused-review sheet does not veto a ready next-capture phrase sheet, and a missing phrase sheet does not veto a ready focused-review sheet; both sheets missing still reports setup missing. Next-capture recording still requires an explicit `--capture-next` flag; without it the report can stop at `ready_for_capture` with `captureRan=false`. This command is orchestration only. It does not listen, infer text, trust recognizer output, create `human_verified` labels directly, train, or translate.

Focused answer rows are also bound back to the selected focused batch before compile. The generated focused monitor and loop commands include `--focused-batch focused_review_batch.json` and `--focused-answer-audit-report focused_review_answer_audit.json`. The progress monitor writes a `focusedAnswerAudit` block and fails closed with `focused_answers_mismatch` when a filled answer row does not join one of the selected focused packet identities, duplicates a selected packet, or the focused batch report itself is missing or not ready. The same audit reports `answeredSelectedCount`, `missingSelectedCount`, `missingSelectedPacketIds`, and `completeForBatch`, so a single valid answer can move into compile, preflight, and replay without pretending that the full focused batch has been labeled. This is a binding and coverage gate only. It does not listen, infer expected text, promote labels, train, or translate.

Focused answers can also be validated directly before compile with:

text
experiments/acoustic_gate/validate_speech_calibration_focused_review_answers_v0.py

The focused batch exposes this as `validateFocusedAnswers` and writes `focused_review_answer_validation.json`. This standalone validator checks the filled focused answer JSONL, verified-audio readiness, selected-batch membership, hypothesis-field leaks, N'Ko-script expected text, `confirmedSpoken`, verifier metadata, optional FAC target syntax, non-label `reviewDisposition` rows, and any filled `heardAudio` claim before the compiler or replay loop is allowed to touch the rows. A filled `heardAudio` claim must match the selected packet's verified focused WAV evidence, otherwise the validator returns a content failure such as `heard_audio_not_verified`. Its clean label state is `ready_for_focused_compile`; a safe file containing only non-labelable dispositions returns `review_dispositions_only`; content failures become `focused_answers_invalid`, selected-batch failures remain `focused_answers_mismatch`, and recognizer-text leaks remain `review_answers_poisoned`. The validator also writes `sourceFingerprints` for the focused answer file, focused batch report, and focused audio verification report, including file existence, size, mtime, and SHA-256. The status pack and phrase-entry refresh surface those hashes as compact provenance fields such as `focusedAnswerValidationAnswersSha256`, `focusedAnswerValidationBatchSha256`, and `focusedAnswerValidationAudioSha256`. These hashes prove which evidence files were validated without exposing transcript text in summaries. The validator does not compile answers, promote labels, accept recognizer output, train, or translate. It exists so the reviewer can test a partial focused answer file safely before invoking the heavier monitor or review loop.

When the focused review sheet is served by the local phrase-entry server, its `Save to Local Server` control posts to `/api/focused-review-answers`, not the broad `/api/review-answers` endpoint. The broad clean sheet still writes only `review_answers_filled.jsonl` after the full review compiler accepts the rows. The focused endpoint writes only `focused_review_answers_filled.jsonl`, first checks the submitted rows for recognizer-hypothesis leaks, then validates a temporary candidate file against `focused_review_batch.json`, `focused_review_audio_verification.json`, and the focused answer audit. It accepts only `ready_for_focused_compile` or safe `review_dispositions_only` states, then writes the real focused answer file and refreshes `focused_review_answer_validation.json` plus `focused_review_answer_audit.json`. It does not compile focused answers, does not write `checked_labels_filled.jsonl`, does not touch broad `review_answers_filled.jsonl`, does not promote labels, does not train, and does not translate. The status pack now routes `fill_focused_review_answers` to `serveFocusedReviewSheet`, which starts `serve_speech_calibration_phrase_entry_v0.py --open-focused-review` and opens `http://[ip]:8787/review/focused_review_sheet.html`; this avoids the misleading raw-file path where browser-side save cannot reach localhost. This keeps the operator-facing focused sheet aligned with the status-pack recommendation without letting focused human input bypass the focused batch and verified-audio gates.

The served focused review sheet now also consumes a read-only `/api/focused-review-action` endpoint. The endpoint reads `focused_review_handoff.json`, rejects the response if the handoff action packet contains unsafe transcript-bearing fields such as `expectedText`, `recognizerHypothesis`, `candidateText`, `acceptedText`, `hypothesisText`, or `translation`, and otherwise returns the current gate, action packet, and source fingerprints. The focused HTML displays this as a compact gate panel: next safe action, pending rows, validation status, answer SHA-256, audio checked/issues, and allowed command keys. After a valid focused answer save, the server refreshes only the handoff `actionPacket` from the newly written `focused_review_answer_validation.json`, then refreshes the consolidated no-capture phrase-entry status when the browser route has phrase/template paths, and then the page reloads the read-only gate state. This makes `runFocusedPostReview` appear in both the browser gate and the status-pack-derived next command only after validation reaches `ready_for_focused_compile` or `review_dispositions_only`, without requiring the operator to regenerate the whole focused handoff first. This is deliberately display-only plus report refresh. It does not save answers beyond the already validated focused endpoint, import TSV rows, compile focused answers, promote labels, train, translate, or make the model's output true. Its purpose is to make the iPhone/browser review surface show the same governance state as the JSON handoff and consolidated status pack, so the operator can see that post-review is locked until focused answers validate.

Once the server saves a focused answer file, `focused_review_answer_validation.json` becomes first-class status evidence instead of a side report. A clean label file with `ready_for_focused_compile` or a safe disposition-only file with `review_dispositions_only` moves the top-level next command to `runFocusedPostReview`, which runs `run_speech_calibration_focused_postreview_v0.py` to compile, preflight, refresh review progress, analyze dispositions, rebuild acoustic memory evidence, refresh reject-only decoder calibration proposal and simulation reports, and rerun the goal audit. Invalid focused answer states move the top-level next command back to `serveFocusedReviewSheet` so the operator fixes the focused rows rather than accidentally entering broad review, capture, training, or translation. Missing validation still leaves the workflow at `fill_focused_review_answers`. This closes the previous loop where a saved focused answer could be valid locally but the top-level status still behaved as if no focused answer state existed.

The decoder-calibration refresh inside focused post-review is an acoustic-improvement artifact, not a recognition claim. It reads existing CTC-overfire analysis and, when present, evaluation readiness, then writes `speech_decoder_calibration_proposal.json`, `speech_decoder_calibration_simulation.json`, and `speech_decoder_calibration_simulated_examples.jsonl`. When no training-admissible same-evidence human labels exist, the proposal remains `reject_only_unlabeled`: it may count overfire reject gates and possible margin/blank-bias probes, but the simulation preserves the original transcript decisions, leaves accepted transcript count at zero, and changes no labels. This lets the system study how to suppress CTC overfire without quietly converting decoder probes into fake ASR success.

The consolidated status pack and phrase-entry refresh now surface those decoder-calibration numbers under acoustic improvement. They report focused post-review status, decoder proposal mode, reject-gate hits, available decoder probes, simulated annotations, accepted transcript count, label-change count, and acceptance blockers. These fields are intentionally operator-visible because acoustic improvement is allowed to proceed as reject-only analysis before stable recognition exists. The same fields also make the boundary auditable: a status report can show concrete overfire suppression evidence while still showing `acceptedTranscriptCount=0`, `labelsChangedCount=0`, and `completionReady=false`.

The focused batch also exposes a next-capture planner:

text
experiments/acoustic_gate/plan_speech_calibration_next_capture_v0.py

The batch command is `planNextCapture`, and it writes `focused_review_next_capture_plan.json`, `focused_review_next_capture_plan.md`, and `next_capture_phrase_ladder_template.jsonl`. The planner reads `focused_review_batch.json`, `focused_review_answer_validation.json`, `focused_review_loop.json`, `focused_review_disposition_analysis.json`, and the current `speech_calibration_index.json`. Its job is not to label anything. It decides whether the next operational action is to listen and fill focused answers, fix invalid focused answers, run the focused review loop, capture replacement Malinke phrases after non-label dispositions, or move on to evaluation/training once human-verified training-admissible rows exist. The phrase ladder template it writes is deliberately not capture-ready: `expectedText` is empty, `captureReady=false`, and the command it prints uses `collect_speech_calibration_ladder_v0.py --require-capture-ready` so the collector fails before recording until a real N'Ko expected phrase is supplied. By default the planner creates a minimum three-row calibration runway, or a larger batch when selected/reviewed evidence requires it; `--phrase-count` can raise that minimum without writing any language content. Each draft row can include `facCoverageRequest`, `toneCoverageRequest`, and `captureGoal` metadata with `request_only_v0` status. These fields request coverage for known FAC slots and tone evidence; they are not `facTargets`, not labels, not acoustic classifications, and not training data. The planner summary reports `requestedPhraseCount`, `facCoverageRequestSlotCounts`, and `toneCoverageRequestCount` so the next capture pass can be designed around voicing, place, manner, nasality, tone, length, vowel height, vowel backness, and rounding without inventing feature values. This closes the loop between reviewed failure evidence and the next capture pass without turning review metadata, recognizer output, or English test audio into labels.

The next-capture phrase plan can be filled through `import_speech_calibration_next_capture_phrases_v0.py`. The planner exposes `preparePhrasePlanSheet`, which writes `next_capture_phrase_plan_template.tsv` from `next_capture_phrase_ladder_template.jsonl` and initializes the editable `next_capture_phrase_plan.tsv` when that sheet is absent. If the editable sheet already exists, the command preserves existing human-entered rows and appends only missing blank phrase IDs from the current template; it reports the append count as `phraseSheetAppendedCount`. It never overwrites an existing filled phrase sheet. The `importPhrasePlanSheet` command reads `next_capture_phrase_plan.tsv`, writes `next_capture_phrase_ladder_filled.jsonl`, and reports to `next_capture_phrase_plan_import_report.json`. This bridge accepts only explicit human-planned N'Ko `expectedText` and optional known FAC scaffold columns such as `facTone`, `facNasality`, `facVoicing`, `facPlace`, `facManner`, `facLength`, `facVowelHeight`, `facVowelBackness`, and `facRounding`. It rejects recognizer-output columns such as `candidateText`, `acceptedText`, `recognizerHypothesis`, `transcriptDecision`, and `labelComparison`; it rejects Latin expected text; and it clears the filled ladder to a comment-only fail-closed file on import failure. A successful import still reruns `validate_phrase_ladder(..., require_capture_ready=True)` from `collect_speech_calibration_ladder_v0.py` before the plan can be treated as capture-ready. The planner also exposes `validateFilledPhraseLadder` and `captureFilledPhraseLadderWhenReady`, which point at the filled ladder rather than the empty request template. This importer does not listen, infer text, expose or trust recognizer output, create labels, promote labels, train, or translate.

The planner also exposes `preparePhraseCollectionPack`, implemented by `prepare_speech_calibration_phrase_collection_pack_v0.py`. This read-only operator aid writes `short_phrase_collection_pack.json`, `short_phrase_collection_pack.md`, and `phrase_collection_sheet.html` from the current phrase template and editable phrase TSV. The HTML sheet is a local entry surface with right-to-left `expectedText` fields, optional FAC scaffold fields, a browser-side N'Ko-range validator, and a TSV download action for `next_capture_phrase_plan.tsv`. It summarizes each phrase slot, source packet, requested FAC/tone coverage, missing expected text, invalid Latin or poisoned rows, and the exact governed handoff, phrase-entry refresh, and explicit capture commands. It does not fill `expectedText`, infer Malinke, listen, trust recognizer output, create labels, promote labels, capture audio, train, or translate. Its purpose is to make the next human action smaller: type real short Malinke phrases in N'Ko, refresh the no-capture status, then run governed capture only if the report says the ladder is capture-ready without letting blank or garbage text enter the capture ladder.

Post-entry refresh is handled by:

text
experiments/acoustic_gate/refresh_speech_calibration_phrase_entry_status_v0.py

This command exists because the phrase-entry server, FAC audit, next-capture import, failure queue, focused answer validation, and status pack can otherwise drift as separate reports. It runs the phrase collection pack, the FAC scaffold audit, `run_speech_calibration_next_capture_v0.py` with capture disabled, the failure queue, review priority, and, when focused-review artifacts exist, `validate_speech_calibration_focused_review_answers_v0.py` before regenerating the consolidated status pack. It writes `phrase_entry_status_refresh.json` plus `phrase_entry_status_refresh.md`. Its success state `ready_for_capture_without_recording` means only that the explicit human-entered N'Ko phrase plan passed import and collector validation. Its focused-answer transition state `run_focused_postreview` means only that the saved focused answer file has just passed the focused selected-batch, verified-audio, N'Ko-script, confirmation, disposition, and leak checks. Neither state means that audio was recorded, that the recognizer was correct, that the 20

The focused validation refresh is intentionally narrow. If `focused_review_batch.json` or `focused_review_answers_filled.jsonl` exists, the refresh command rewrites `focused_review_answer_validation.json` and `focused_review_answer_audit.json` before the status pack reads them. This prevents an old `awaiting_review_answers` or old `ready_for_focused_compile` report from controlling the next command after the underlying answer file changed. If no focused-review artifacts exist, the refresh does not invent a focused-answer state. This still does not compile focused answers, promote labels, accept recognizer output, train, or translate.

The same lane now has a one-command orchestration wrapper:

text
experiments/acoustic_gate/run_speech_calibration_next_capture_v0.py

The planner exposes this as `checkNextCaptureOrchestration` and `runNextCapture`, and the wrapper writes `next_capture_orchestration.json`. The check command imports the filled TSV, validates `next_capture_phrase_ladder_filled.jsonl` with the collector's capture-ready rules, and stops before recording unless `--capture` is present. The capture command then routes the filled ladder through `collect_speech_calibration_ladder_v0.py --snapshot-after-collection`, expects the collector to emit `operator_expected_labels.jsonl` and `speech_calibration_evidence_snapshot.json`, and immediately replays the durable snapshot through `run_speech_calibration_snapshot_cycle_v0.py` with the filled phrase plan and operator labels attached. This is still an evidence-ingestion and replay bridge, not an acceptance shortcut: it does not infer phrase text, does not trust recognizer output, does not create `human_verified` labels, does not train a model, and does not translate. If the TSV is missing, Latin, poisoned by recognizer fields, or not capture-ready, the orchestration report remains fail-closed with `captureRan=false` and `replayRan=false`.

The next-capture report also exposes `prepareOperatorExpectedVerificationSheet` and `importOperatorExpectedVerificationSheet`. These commands are the post-capture human-listening bridge for newly captured `operator_expected` rows. They prepare the TSV verification sheet, import only confirmed spoken rows into checked-label JSONL, and leave final promotion to `preflight_speech_calibration_checked_labels_v0.py` plus `promote_speech_calibration_labels_v0.py`. This closes the operational gap between live capture and training eligibility without pretending that the live ASR text is correct.

The review-progress state can be checked with:

text
experiments/acoustic_gate/monitor_speech_calibration_review_progress_v0.py

This command writes `speech_calibration_review_progress.json`. It reads the clean review session, the downloaded `review_answers_filled.jsonl`, the answer compiler report, and the checked-label preflight report. It can also read `--audio-verification-report speech_calibration_review_audio_verification.json` for the full clean review queue, or `--audio-verification-report focused_review_audio_verification.json`, `--focused-batch focused_review_batch.json`, and `--focused-answer-audit-report focused_review_answer_audit.json` for focused mode. In either audio mode it includes an `audioVerification` readiness block and fails closed before compile/preflight if the referenced WAV evidence is missing, unreadable, hash-mismatched, or otherwise not ready. With `--run-compile --run-preflight`, it can compile reviewer answers and preflight the resulting checked labels, but it still does not create labels, accept transcripts, classify FAC/tone, promote to `human_verified`, or train. Its main states are `focused_audio_verification_failed`, `focused_answers_mismatch`, `awaiting_review_answers`, `review_answers_invalid_json`, `review_answers_poisoned`, `review_dispositions_only`, `answer_compile_failed`, `checked_label_preflight_failed`, and `ready_for_snapshot_rerun`. Only the final state means the filled review answers and, when provided, the audio verification and focused answer audit are clean enough to rerun the snapshot or calibration cycle with `--checked-labels`; `review_dispositions_only` means the rows were reviewed but no N'Ko label is available for training or replay; `focusedAnswerAudit.completeForBatch` separately tells whether every selected focused packet has an answer or disposition. It still does not mean the labels have already been promoted.

The full review queue now also has a safe priority aid:

text
experiments/acoustic_gate/prepare_speech_calibration_review_priority_v0.py

This command reads `review_session_queue.jsonl`, `speech_calibration_review_audio_verification.json`, the failure queue, and the review-progress report, then writes `speech_calibration_review_priority.json`, `speech_calibration_review_priority_rows.jsonl`, and `speech_calibration_review_priority.md`. It ranks rows by verified audio state, short usable duration, duplicate packet count, failure bucket, failure lane, source kind, and pending-answer state. It deliberately omits recognizer hypotheses, candidate text, accepted text, expected text, answer seeds, checked labels, training labels, and translations. Legacy queue inputs may still contain those unsafe fields, so the priority generator continues to count `sourceUnsafeFieldCount` and strip them from output. In the current regenerated review queue, that source count is expected to be zero. The priority report is therefore triage evidence only: it can say which short verified packets are worth listening to first and which duplicate or long recordings should wait, but it cannot decide what was spoken and cannot make anything training-admissible.

The post-review replay can then be run with:

text
experiments/acoustic_gate/run_speech_calibration_review_loop_v0.py

The review session exposes this as `runReviewLoop`. The loop writes `speech_calibration_review_loop.json`, calls the progress monitor, waits for `ready_for_snapshot_rerun`, and only then reruns either `run_speech_calibration_snapshot_cycle_v0.py` or `run_speech_calibration_cycle_v0.py` with `checked_labels_filled.jsonl`. It does not create reviewer answers, infer labels, or trust recognizer output. If reviewer-supplied checked labels pass compile and preflight, it may invoke the existing promotion gate during replay so the calibration cycle can produce `human_verified` rows and training manifests from real checked evidence.

The candidate-label path is covered end-to-end by:

text
experiments/acoustic_gate/test_speech_calibration_candidate_label_rehearsal.py

This regression creates a valid synthetic `needs_label` packet with a bounded `candidateText` and an expected N'Ko phrase. It then proves the intended lifecycle: the first calibration build is evaluation-ready but not training-admissible, CER is computed against the candidate hypothesis, the review package carries the candidate only as `recognizerHypothesis`, promotion requires explicit human confirmation, and the rebuilt calibration set becomes training-admissible only after the promoted `human_verified` row is joined back to the same packet evidence. This test is synthetic, so it does not prove real recognition quality. It proves the governance path is wired before real phrase-ladder collection.

Translation remains downstream. This stage is not English translation and not live Malinke recognition in the product sense. It is the governed evidence layer that makes those later systems trainable without letting garbage become knowledge.

As of 2026-06-10, the focused review sheet exposes the full approved v0 FAC scaffold as optional human-entered fields: voicing, place, manner, nasality, tone, length, vowel height, vowel backness, and rounding. The sheet builds those fields from an explicit slot table and serializes only non-empty values into `facTargets`; non-label disposition rows still omit FAC targets. The server-side focused save endpoint validates those optional values through the existing focused answer validator, compiler, and promotion FAC rules, so unknown slots, blank values, proxy statuses, and recognizer text remain rejected. This is a user-interface and provenance improvement only. It does not infer FAC values from audio, train FAC heads, accept ASR text, promote labels, or translate. The live regenerated handoff still has `currentGate=human_focused_review_answers`, `pendingRowCount=3`, `focusedAnswerValidation.status=awaiting_review_answers`, `postreviewAllowed=false`, and `machineLeakCount=0`; the allowed command keys remain `refreshPhraseEntryStatus` and `serveFocusedReviewSheet` until real human review answers or dispositions are saved.

The focused TSV handoff now mirrors that browser path. `focused_review_answer_decisions.tsv` and its template include the same optional FAC columns used by next-capture planning: `facVoicing`, `facPlace`, `facManner`, `facNasality`, `facTone`, `facLength`, `facVowelHeight`, `facVowelBackness`, and `facRounding`. `import_speech_calibration_focused_review_answers_v0.py` converts only non-empty known FAC columns into canonical `facTargets` objects with `status=provided_v0` and `evidence=human focused review decision TSV`; unknown `fac*` headers fail closed as `unknown_fac_tsv_column`. Labelable N'Ko rows may carry those FAC targets into the existing validator and compiler path. Non-label disposition rows fail closed if any FAC value is present, because a row marked English, noise, silence, or uncertain cannot simultaneously provide speech-feature targets. `inspect_speech_calibration_handoff_sheets_v0.py` reports focused FAC slot counts and per-row `facTargetSlots` without importing anything. The focused handoff initializer upgrades old blank decision TSV sheets that lack the FAC headers, but preserves human-edited sheets rather than overwriting user work. This keeps the browser and TSV review paths aligned while preserving the core boundary: FAC scaffolds are human-entered supervision requests or targets, not acoustic FAC predictions, not transcript acceptance, not training labels, and not translation.

The consolidated status surfaces now carry those focused TSV FAC counts forward as `focusedHandoffFacTargetSlotCounts` in both `speech_calibration_status_pack.json` and `phrase_entry_status_refresh.json`, and the Markdown briefs print the same field. This lets the operator see whether any focused human review row has provided voicing, place, manner, nasality, tone, length, vowel height, vowel backness, or rounding targets without opening raw TSV or inspector JSON. In the current live state all nine focused FAC target counts are zero, `focusedAnswerRowCount=0`, `focusedHandoffPendingRowCount=3`, `focusedHandoffMachineLeakCount=0`, `decoderCalibrationAcceptedTranscriptCount=0`, and `toneFusionReadyCount=0`. That is the correct governed state: the review lane is ready for human listening, but no transcript, FAC target, tone target, training row, or translation has been accepted.

The focused handoff can now also export a portable review packet with:

text
experiments/acoustic_gate/export_speech_calibration_focused_review_packet_v0.py

The export copies only verified focused-review WAV evidence into `focused_review_packet/audio/`, writes a blank `focused_review_answer_decisions.tsv` with the same FAC columns as the main handoff, writes `focused_review_packet_manifest.json`, writes a README, and now writes `focused_review_packet_sheet.html`. The static sheet plays the copied WAVs, lets a human fill decision, expected N'Ko text, disposition reason, reviewer fields, and optional FAC scaffold values, then downloads a TSV for the existing governed importer. It scans its generated manifest, TSV, README, and static sheet for unsafe recognizer-field names such as `candidateText`, `acceptedText`, `recognizerHypothesis`, `hypothesisText`, `labelComparison`, and `transcriptDecision`; any leak blocks the packet. The live packet currently has 3 rows, 6 copied WAVs, 0 issues, 0 machine-field leaks, and `reviewSheetReady=true`. This packet exists to make human listening easier. It still does not accept transcripts, save answers, import rows, create labels, infer FAC/tone, train, or translate.

Public Bambara/Manding speech corpora should be treated as governed acoustic and evaluation sources, not as replacements for live focused review. The current audit command is:

text
experiments/acoustic_gate/audit_public_manding_speech_corpora_v0.py

It reads Hugging Face metadata plus static public-source metadata for non-Hugging Face corpora and writes `public_manding_speech_corpora.json` plus `public_manding_speech_corpora.md`. The live audit currently tracks 16 available Bambara/Manding/N'Ko/Maninka candidates. Four are immediately broad-acoustic-training ready through the existing Hugging Face loader path: RobotsMali `bam-asr-early` (Bambara audio/transcription/French translation, about 38.7K audio files and about 37 hours, CC-BY-4.0), RobotsMali `jeli-asr` (about 33.6K files and about 32 hours, CC-BY-4.0), RobotsMali `afvoices` (African Next Voices Bambara ASR, CC-BY-4.0, with `human-corrected`, `model-annotated`, and `short` configs; Hugging Face describes it as 423 hours of segmented audio plus 612 hours of raw recordings), and RobotsMali `kunkado` (Bambara/code-switching audio and text with `human-reviewed`, `semi-first`, and `semi-second` configs, CC-BY-SA-4.0). Five are heldout-evaluation ready, because the four RobotsMali corpora can supply evaluation splits and `MALIBA-AI/bambara-asr-benchmark` is an open CC-BY-4.0 benchmark with an `eval` config. That benchmark is deliberately not broad-training ready; it is useful for measuring acoustic behavior without feeding the trainer. Two OpenSLR/Nicolingua sources are now tracked separately: `OpenSLR/106`, the West African VA ASR corpus with 10,083 utterances in French, Maninka, Pular, and Susu from 49 speakers under CC-BY-SA-4.0, and `OpenSLR/105`, the West African Radio Corpus with 17,090 30-second Guinean radio clips including Maninka under CC-BY-SA-4.0. These are real public-data leads, but they require an external OpenSLR loader before they can enter the governed materialization path; `OpenSLR/105` is also an unsupervised acoustic-representation candidate rather than supervised transcript truth. Nine Hugging Face candidates are still visible but blocked behind access/license/schema review: `asr-africa/ASRAfricaDataEfficiencyBenchmark`, `sudoping01/nko-asr`, `oza75/bambara-asr`, `djelia/bambara-audio`, `sudoping01/open-bambara-asr-dataset`, `djelia/bambara-asr-dataset-y`, `djelia/bambara-asr`, `djelia/bambara-asr-evaluation`, and `djelia/bambara-asr-v2`. The important direct-N'Ko lead remains `sudoping01/nko-asr`: the live metadata reports gated access and no license, so the audit records it as `nkoLabelCandidate=true` while keeping both `broadAcousticTraining=false` and `heldoutEvaluation=false` until access, terms, and schema are reviewed. The status pack now surfaces this as `publicCorporaDatasetCount=16`, `publicCorporaAvailableCount=16`, `publicCorporaBroadAcousticTrainingReadyCount=4`, `publicCorporaHeldoutEvaluationReadyCount=5`, `publicCorporaExternalLoaderCandidateCount=2`, `publicCorporaUnsupervisedAcousticRepresentationCandidateCount=1`, `publicCorporaNkoLabelCandidateCount=1`, `publicCorporaAccessOrLicenseReviewCount=9`, `publicCorporaLivePacketTruthCount=0`, and `publicCorporaDirectToneSupervisionCount=0`, and exposes `auditPublicMandingSpeechCorpora` as a refresh command.

Public corpus materialization is now planned by:

text
experiments/acoustic_gate/prepare_public_manding_corpus_ingest_plan_v0.py

This command reads the public-corpus audit and writes `public_manding_corpus_ingest_plan.json`, `public_manding_corpus_ingest_rows.jsonl`, and `public_manding_corpus_ingest_plan.md`. It does not download data. It classifies configs into explicit lanes: `supervised_acoustic_candidate`, `heldout_evaluation_candidate`, `weak_supervision_quarantine`, `smoke_ingest_candidate`, `schema_probe_candidate`, and `blocked_repo_review`. The live plan currently reports 16 available candidates, 5 repos ready for manifest ingest, 11 repos blocked for review, 57 relevant configs, 7 supervised acoustic candidates, 1 heldout-evaluation candidate, 3 weak-supervision quarantine configs, 1 smoke-ingest candidate, 2 schema probes, 43 blocked configs, `downloadRan=false`, and `trainingStarted=false`. The 11 blocked repos include the 9 access/license/schema review candidates plus `OpenSLR/106` and `OpenSLR/105`, which are blocked specifically because an external OpenSLR loader is required. The status pack reads the ingest plan and exposes `preparePublicMandingCorpusIngestPlan` as a refresh command while preserving the same truth boundary: public corpus ingest planning cannot label a live iPhone packet, cannot provide direct tone truth, cannot create `human_verified` labels, and cannot train without a later explicit loader/materialization gate.

Schema-only materialization is now probed by:

text
experiments/acoustic_gate/probe_public_manding_corpus_loader_v0.py

This command reads the ingest plan, calls Hugging Face dataset-server metadata endpoints for eligible configs, and writes `public_manding_corpus_loader_probe.json`, `public_manding_corpus_loader_probe_rows.jsonl`, and `public_manding_corpus_loader_probe.md`. It stores split names, feature names, text/audio column candidates, and parquet counts. It deliberately does not store transcript strings, audio URLs, audio bytes, labels, translations, or model outputs. The live probe reports 57 config rows, 14 materialization-probe-ready rows, 7 broad-acoustic materialization-ready rows, 9 heldout-evaluation-ready rows, 3 weak-supervision quarantine rows, 2 schema-probe-only rows, and 43 blocked rows. The blocked rows include the direct `sudoping01/nko-asr` configs until access, license, and label schema are reviewed, plus the two OpenSLR/Nicolingua configs until an external loader is implemented. The existing research corpus already used `bam-asr-early` and `afvoices` to build roughly 290K Bambara/Manding speech pairs, with N'Ko produced through the deterministic Latin-to-IPA-to-N'Ko bridge where needed. These corpora can help retrain or adapt acoustic components, build stronger validation splits, test the Latin-to-N'Ko bridge, and reduce the low-resource acoustic-data bottleneck. They still do not prove what was spoken in a new iPhone live packet, do not provide direct tone supervision unless tone-marked labels are separately proven, and do not replace focused human review for live Malinke calibration. Live packets still need human-reviewed labels or dispositions before they become calibration truth.

The loader probe now feeds a governed materialization manifest:

text
experiments/acoustic_gate/prepare_public_manding_corpus_materialization_manifest_v0.py

This command reads the ingest plan plus loader probe and writes `public_manding_corpus_materialization_manifest.json`, `public_manding_corpus_materialization_rows.jsonl`, and `public_manding_corpus_materialization_manifest.md`. It assigns each config to a future materialization lane: `broad_acoustic_supervised`, `heldout_eval_only`, `weak_supervision_quarantine`, `schema_probe_only`, or `blocked_not_materializable`. It records source URLs, licenses, selected split names, verified audio/text column names, parquet counts, and capped row budgets. It still does not download samples. The live manifest currently has 57 rows, 14 rows ready for a later explicit capped materializer, 7 broad-acoustic supervised rows, 9 heldout-evaluation-ready rows, 3 weak-supervision quarantine rows, 2 schema-probe rows, and 43 blocked rows. Its default caps are 5,000 train rows per supervised config, 500 eval rows per eval-capable config, 250 quarantine rows per weak-supervision config, and 25 schema-probe rows per schema-only config. The status pack exposes this as `publicCorpusMaterializationManifestStatus=public_corpus_materialization_manifest_ready`, `publicCorpusReadyForFutureCappedDownloadCount=14`, `publicCorpusMaterializationBroadAcousticTrainingReadyCount=7`, `publicCorpusMaterializationHeldoutEvaluationReadyCount=9`, `publicCorpusMaterializationBlockedCount=43`, `publicCorpusMaterializationDownloadRan=false`, and `publicCorpusMaterializationTrainingStarted=false`. It also exposes `preparePublicMandingCorpusMaterializationManifest` as the safe refresh command. This artifact cleanly answers which public data can become capped train/eval material later, while preserving the boundary that public corpora are not live packet truth and the direct N'Ko dataset lead remains blocked until review.

The next gate is the explicit capped-sample materializer:

text
experiments/acoustic_gate/materialize_public_manding_corpus_capped_samples_v0.py

By default this command is a dry-run planner. It reads `public_manding_corpus_materialization_manifest.json`, selects a capped subset of eligible configs and lanes, and writes `public_manding_corpus_capped_sample_materialization.json`, `public_manding_corpus_capped_sample_plan_rows.jsonl`, `public_manding_corpus_capped_sample_rows.jsonl`, and `public_manding_corpus_capped_sample_materialization.md`. Without `--allow-download`, it does not fetch dataset rows, store transcript strings, store audio URLs, store audio bytes, create labels, train, or touch live packet truth. The current live capped run used the explicit allowed path on 3 broad-acoustic configs with 2 rows per config. It fetched 6 public metadata rows, stored 6 transcript strings and 6 remote audio URLs, stored 0 audio bytes, started 0 training runs, created 0 live-packet truth rows, and created 0 direct tone-supervision rows. The rows come from `RobotsMali/bam-asr-early` and `RobotsMali/afvoices` and are marked as public-corpus supervision/evaluation material, not `human_verified` live calibration labels. The status pack exposes both `dryRunPublicMandingCorpusCappedSamples` and `allowPublicMandingCorpusCappedSamples`, making the difference between a no-fetch plan and an explicit capped row-metadata fetch visible. The readiness gate separates materialized broad-training rows from materialized heldout-evaluation rows, so a heldout-only source can no longer accidentally flip `publicRowsUsableForTrainingNow=true`.

The capped public rows now feed a guarded public acoustic manifest:

text
experiments/acoustic_gate/export_public_manding_acoustic_manifest_v0.py

This command reads `public_manding_corpus_capped_sample_rows.jsonl` and writes `public_manding_acoustic_manifest.json`, `public_manding_acoustic_training_rows.jsonl`, `public_manding_acoustic_heldout_rows.jsonl`, `public_manding_acoustic_excluded_rows.jsonl`, and `public_manding_acoustic_manifest.md`. It only admits rows whose poison guards remain intact: `mustNotJoinLivePackets=true`, `mustNotUseAsHumanVerifiedLabel=true`, `forbiddenUses.livePacketTruth=true`, `forbiddenUses.humanVerifiedLabelSubstitute=true`, a non-empty public label string, and a public audio reference. The current live manifest is `public_manding_acoustic_manifest_ready` with 6 sample rows, 6 broad-acoustic training rows, 6 heldout-evaluation rows, 0 excluded rows, 12 remote audio URL row references across training and heldout outputs, 0 local audio-byte rows, 0 stored audio bytes, `trainingStarted=false`, `livePacketTruthCount=0`, `humanVerifiedLabelSubstituteCount=0`, and `directToneSupervisionCount=0`. It is ready for a public acoustic experiment in the manifest sense, but not for local audio-byte training until a later guarded audio fetch or streaming loader exists.

Blocked public corpus leads are now made explicit by:

text
experiments/acoustic_gate/prepare_public_manding_corpus_access_review_v0.py

This command reads the audit, ingest plan, loader probe, and materialization manifest, then writes `public_manding_corpus_access_review.json`, `public_manding_corpus_access_review_rows.jsonl`, and `public_manding_corpus_access_review.md`. It turns blocked repositories into a review queue rather than leaving them as vague failures. The current live report has 11 active review rows, 1 direct N'Ko-label candidate review row, 9 access-review rows, 9 license-review rows, 11 schema-review rows, and 43 blocked materialization configs. Its priority order puts `sudoping01/nko-asr` first because it is the direct N'Ko ASR candidate; the next gate for that row is `request_access_and_review_nko_schema`. The other review rows are `asr-africa/ASRAfricaDataEfficiencyBenchmark`, `djelia/bambara-asr`, `djelia/bambara-asr-dataset-y`, `djelia/bambara-asr-evaluation`, `djelia/bambara-asr-v2`, `djelia/bambara-audio`, `oza75/bambara-asr`, `sudoping01/open-bambara-asr-dataset`, `OpenSLR/106`, and `OpenSLR/105`. The two OpenSLR rows are not gated-license problems; they are external-loader/schema problems. This access-review packet still downloads nothing, stores no transcript strings or audio URLs, starts no training, creates no labels, and keeps `livePacketTruthCount=0` and `directToneSupervisionCount=0`. Its purpose is to make future public-data expansion auditable: license or terms must be recorded, access must be granted or public materialization confirmed, audio/text columns must be verified, label script must be checked, N'Ko labels must be confirmed or Latin-to-N'Ko derivation marked, external loaders must be implemented where required, and speaker/split leakage must be audited before any blocked source can move into capped materialization.

Direct N'Ko Hugging Face leads are now schema-reviewed by:

text
experiments/acoustic_gate/review_public_nko_hf_dataset_schema_v0.py

This command reads the access-review queue, fetches Hugging Face repository metadata only, and writes `public_nko_hf_dataset_schema_review.json`, `public_nko_hf_dataset_schema_review_rows.jsonl`, and `public_nko_hf_dataset_schema_review.md`. It does not download parquet rows, read transcript examples, store transcript strings, store audio URLs, create labels, train, translate, or verify live iPhone packets. The live Hugging Face metadata confirms that `sudoping01/nko-asr` is not imaginary: it is manually gated, private false, disabled false, exposes Audio/Text parquet metadata, and includes configs whose feature names contain `audio` and `nko`. The visible configs include `eval`, `jeli-asr`, `kunkado`, and `mali_pense`; together the metadata reports tens of thousands of examples, train/test splits where available, and audio sampling-rate metadata for some configs. That makes it the strongest direct public N'Ko-ASR supervision lead so far. It is still not training data today, because the repo is manually gated and the license is not exposed in the metadata. The schema review therefore reports a direct N'Ko ASR candidate, audio-column evidence, N'Ko-label-column evidence, and `usableForTrainingNowCount=0`. The next real gate is accepting/recording access terms, resolving license/allowed use, confirming that the `nko` column is the intended target label, checking splits/speaker leakage, and only then allowing a capped materialization path. This is the direct speech-to-N'Ko data lane, but it must pass governance before it can affect the recognizer.

The direct N'Ko Hugging Face lead now has an explicit local access-decision packet:

text
experiments/acoustic_gate/prepare_public_nko_hf_access_decision_packet_v0.py

This command reads `public_nko_hf_dataset_schema_review.json` and writes `public_nko_hf_access_decision_packet.json`, `public_nko_hf_access_decision_rows.jsonl`, `public_nko_hf_access_decisions_template.jsonl`, and `public_nko_hf_access_decision_packet.md`. It does not accept Hugging Face terms, request access, download rows, read transcript examples, store transcript strings, store audio references, create labels, train, translate, or verify live iPhone packets. Its job is to turn the `sudoping01/nko-asr` metadata lead into an auditable decision form: access terms accepted or not, accepted-by timestamp, license or terms source, allowed and prohibited uses, N'Ko label-column confirmation, audio-column confirmation, split and speaker leakage review plans, and data-protection notes. The live packet is expected to report one direct N'Ko ASR access-decision row, one required access/license decision, zero materialization-ready rows, `downloadRan=false`, `trainingStarted=false`, `storedTranscriptStringCount=0`, `storedAudioReferenceCount=0`, `livePacketTruthCount=0`, and `directToneSupervisionCount=0`. Only after that template is truthfully filled can a later capped materialization gate consider the dataset. Until then, this public N'Ko lead is real evidence that useful data may exist, not evidence that the current live recognizer is correct.

Filled direct N'Ko Hugging Face access decisions are validated by:

text
experiments/acoustic_gate/validate_public_nko_hf_access_decisions_v0.py

This command reads `public_nko_hf_access_decision_packet.json` plus the decision JSONL, then writes `public_nko_hf_access_decision_validation.json`, `public_nko_hf_access_decision_validation_rows.jsonl`, and `public_nko_hf_access_decision_validation.md`. The generated blank template intentionally fails this validation because it still has empty access, license, allowed-use, label-column, audio-column, split-leakage, speaker-leakage, and data-protection fields, and it remains marked `templateOnly=true` and `notTrainingData=true`. A row becomes `ready_for_capped_materialization` only after access terms are explicitly accepted, reviewer/timestamp/license/source are recorded, allowed uses include research, training, evaluation, or ASR, the N'Ko label column and audio column are confirmed, split and speaker leakage review plans are present, and the row is no longer marked as a template or non-training placeholder. Even then the validation gate does not download, train, translate, or label live packets; it only unlocks the next capped materialization gate.

Validated direct N'Ko Hugging Face rows then pass through a capped materialization planner:

text
experiments/acoustic_gate/materialize_public_nko_hf_capped_samples_v0.py

This command reads `public_nko_hf_access_decision_validation.json` and writes `public_nko_hf_capped_materialization.json`, `public_nko_hf_capped_materialization_plan_rows.jsonl`, `public_nko_hf_capped_materialization_sample_rows.jsonl`, and `public_nko_hf_capped_materialization.md`. It is a dry-run gate by default. If access validation has zero rows marked `ready_for_capped_materialization`, the planner writes `public_nko_hf_capped_materialization_waiting_for_access_validation`, zero planned sources, zero materialized rows, `downloadRan=false`, `trainingStarted=false`, `storedTranscriptStringCount=0`, `storedAudioReferenceCount=0`, and `livePacketTruthCount=0`. If a row is later validated, this planner can create capped plan rows with target column `nko`, audio column `audio`, direct N'Ko supervision allowed for public-corpus use, and live-packet truth explicitly forbidden. It still does not fetch dataset rows or train. A separate explicit download/materialization gate must exist before transcript strings or audio references can enter any public-corpus sample artifact.

External public corpora now have their own loader-planning artifact:

text
experiments/acoustic_gate/prepare_public_external_corpus_loader_plan_v0.py

This command reads the public-corpus audit and writes `public_external_corpus_loader_plan.json`, `public_external_corpus_loader_plan_rows.jsonl`, and `public_external_corpus_loader_plan.md`. It is the bridge between "we found a real public Maninka source" and "we can safely materialize it." The current live plan has 2 external-loader candidates, both ready for external-loader implementation: `OpenSLR/106` and `OpenSLR/105`. `OpenSLR/106` is classified as `supervised_external_asr_candidate`, meaning it can later support broad acoustic training and heldout evaluation after an OpenSLR tarball loader, archive checksum, transcript manifest schema verification, language filtering for Maninka rows, split/speaker leakage checks, and explicit capped materialization. `OpenSLR/105` is classified as `unsupervised_acoustic_representation`, meaning it can later support acoustic world-model or representation learning, but not supervised transcript truth unless a separate gold manifest is proven. The live external-loader plan records `downloadRan=false`, `extractRan=false`, `trainingStarted=false`, `storedTranscriptStringCount=0`, `storedAudioUrlCount=0`, `livePacketTruthCount=0`, and `directToneSupervisionCount=0`. The status pack exposes this as `publicExternalCorpusLoaderCandidateCount=2`, `publicExternalCorpusReadyForLoaderImplementationCount=2`, `publicExternalCorpusSupervisedAsrCandidateCount=1`, and `publicExternalCorpusUnsupervisedAcousticRepresentationCandidateCount=1`, plus the `preparePublicExternalCorpusLoaderPlan` refresh command. This prevents the old failure mode where an external public dataset was either invisible or vaguely "blocked"; it is now a concrete next engineering lane without being treated as train-ready truth.

External public corpora also have a capped materialization dry-run gate:

text
experiments/acoustic_gate/materialize_public_external_corpus_capped_samples_v0.py

This command reads `public_external_corpus_loader_plan.json` and writes `public_external_corpus_capped_materialization.json`, `public_external_corpus_capped_plan_rows.jsonl`, `public_external_corpus_capped_sample_rows.jsonl`, and `public_external_corpus_capped_materialization.md`. In the current safe mode it plans the two OpenSLR/Nicolingua sources without fetching their archives: `OpenSLR/106` is planned as one supervised external ASR source with a capped supervised sample budget, while `OpenSLR/105` is planned as one unsupervised acoustic-representation source with a capped clip budget. The live report records `plannedSourceCount=2`, `plannedLaneCounts={"supervised_external_asr_candidate":1,"unsupervised_acoustic_representation":1}`, `downloadAllowed=false`, `downloadRan=false`, `extractAllowed=false`, `extractRan=false`, `archiveDownloadedCount=0`, `archiveExtractedCount=0`, `tarMemberManifestRowCount=0`, `materializedSampleRowCount=0`, `storedTranscriptStringCount=0`, `storedAudioUrlCount=0`, `storedArchiveByteCount=0`, `trainingStarted=false`, `livePacketTruthCount=0`, and `directToneSupervisionCount=0`. Passing `--allow-download` is deliberately blocked in this materializer until a separate archive download and member-manifest gate exists. That means OpenSLR has moved from "we should look into this" to a governed engineering lane, but it still cannot poison the recognizer, become a substitute for human-reviewed iPhone packets, or support a live translation claim.

External local archive member inspection is now handled by:

text
experiments/acoustic_gate/inspect_public_external_archive_members_v0.py

This command reads `public_external_corpus_loader_plan.json` plus a local-only `public_external_archive_local_archives.jsonl` map, lists tar/tgz members without extraction, and writes `public_external_archive_member_inspection.json`, `public_external_archive_local_archive_rows.jsonl`, `public_external_archive_member_manifest_rows.jsonl`, and `public_external_archive_member_inspection.md`. It is the missing bridge between "the OpenSLR archive URL is known" and "the preflight can prove that the archive has audio and transcript-manifest member paths." The command does not download archives. It does not extract files. It does not read transcript contents. It does not store audio bytes or archive bytes in the report. It does not create labels, verify live iPhone packets, provide direct tone truth, train, or translate. With no local archive map, the live status is `public_external_archive_member_inspection_waiting_for_local_archives`: 2 external sources, archive map loaded false, local archives found 0, archives scanned 0, member manifest rows 0, archive downloads 0, archive extracts 0, stored transcript strings 0, stored audio bytes 0, training false, live-packet truth 0, and direct tone-supervision rows 0. Once a local archive path is provided, the same command can produce member metadata rows that feed `prepare_public_external_archive_preflight_v0.py`; only that later preflight decides whether the archive structure is ready for capped materialization.

External archives now have a preflight/member-manifest gate:

text
experiments/acoustic_gate/prepare_public_external_archive_preflight_v0.py

This command reads `public_external_corpus_loader_plan.json` and an optional `public_external_archive_member_manifest_rows.jsonl`, then writes `public_external_archive_preflight.json`, `public_external_archive_preflight_rows.jsonl`, and `public_external_archive_preflight.md`. Without a member manifest it still produces useful proof: the live report sees both OpenSLR archives, confirms that the archive URLs are known, and marks both rows as `awaiting_archive_member_manifest`. `OpenSLR/106` requires at least one audio member and one transcript-manifest member before it can become ready for the later capped materialization gate, while `OpenSLR/105` requires at least one audio member because its current use is unsupervised acoustic representation rather than supervised transcript truth. The live preflight status is `public_external_archive_preflight_waiting_for_member_manifest` with `externalSourceCount=2`, `preflightReadyCount=0`, `awaitingMemberManifestCount=2`, `memberManifestLoaded=false`, `memberManifestRowCount=0`, `archiveDownloadedCount=0`, `archiveExtractedCount=0`, `storedTranscriptStringCount=0`, `storedAudioUrlCount=0`, `storedArchiveByteCount=0`, `trainingStarted=false`, `livePacketTruthCount=0`, and `directToneSupervisionCount=0`. When a future archive-member manifest is supplied, this gate can certify path-level archive structure before any extraction, but it still does not inspect transcript contents, create labels, train, translate, or verify live iPhone packets.

The public-corpus and live-calibration lanes are now summarized together by:

text
experiments/acoustic_gate/prepare_speech_acoustic_improvement_readiness_v0.py

This command reads the status pack, live training manifest, evaluation report, FAC alignment report, tone-fusion report, decoder-calibration proposal and simulation, public materialization manifest, capped-sample materialization report, and public access-review report. It writes `speech_acoustic_improvement_readiness.json` and `speech_acoustic_improvement_readiness.md`. Its current live status is `ready_for_evaluation_not_training`: there are 25 live packets, 18 valid packets, 2 evaluation-ready rows, 0 training-admissible live labels, 0 FAC training rows, and 0 accepted decoder transcripts. Public acoustic materialization is now past the first capped metadata gate because 6 public sample rows have been materialized from 3 broad-acoustic configs, but the system still reports `storedAudioByteCount=0` and `trainingStarted=false`. The public rows are useful for broad acoustic experiment staging and heldout evaluation planning; they do not make live calibration training ready and do not label any iPhone packet. The next gates are now `fill_focused_review_answers`, `resolve_public_access_license_schema_review`, and `collect_fac_tone_targets_after_labels`. This readiness report is intentionally conservative: it does not create labels, accept transcripts, download audio bytes, train models, classify FAC or tone, verify live iPhone packets, or translate. Its purpose is to prevent the system from mistaking the existence of public Malinke/Bambara/Manding audio, a direct N'Ko dataset lead, an operator-expected evaluation row, a decoder reject gate, or an acoustic proxy for actual training truth.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-brain-scanner/experiments/acoustic_gate/SPEECH-CALIBRATION-ACOUSTIC-IMPROVEMENT-V0.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture