N'Ko Uncertainty Packet Execution Plan
- a loose chain of scripts passing text around - a real speech system with explicit uncertainty, provenance, and partition-aware routing
Full Public Reader
N'Ko Uncertainty Packet Execution Plan
Date: 2026-04-28
Goal
Define and roll out the exact packet that connects:
Whisper/trajectory ASR -> AGP/Gemma correction -> provenance search -> TTS subset selectionThis packet is the operational difference between:
- a loose chain of scripts passing text around
- a real speech system with explicit uncertainty, provenance, and partition-aware routing
Why This Plan Exists
We already have:
- trajectory ASR on Vast
- AGP/Gemma correction on Mac4/Mac5
- ASR partitioning (`stable|boundary|uncertain|novelty`)
- a canonical segment corpus at `artifacts/corpus/segments.{jsonl,parquet}`
What is still missing is the formal interface. Right now the stack is too text-centric. The next step is to make uncertainty and routing first-class.
Design Principles
1. Audio evidence stays upstream
- Gemma does not replace the acoustic model.
2. Correction stays bounded
- AGP proposes; the gate decides.
3. Every row is attributable
- raw ASR, corrected text, and decisions remain inspectable.
4. Partitions are policy, not decoration
- `stable`, `boundary`, `uncertain`, and `novelty` drive downstream behavior.
5. Search and TTS consume different slices
- not every corrected utterance is valid TTS training data.
Packet Schema
A. Identity
- `feat_id`
- `audio_id`
- `audio_path`
- `episode_id`
- `segment_id`
- `split`
- `script`
- `mode`
B. Timing and speaker
- `start_ms`
- `end_ms`
- `duration_ms`
- `speaker_id`
- `speaker_confidence`
- `speaker_cluster_version`
C. Acoustic output
- `asr_text_raw`
- `asr_text_postprocessed`
- `reference_text`
- `ctc_confidence`
- `cer_edits`
- `reference_chars`
- `trajectory_scalars`
- `partition`
D. Local uncertainty summary
- `top_confusable_spans`
- list of spans with alternate characters/tokens and confidence deltas
- `n_best_hypotheses`
- top candidates with scores
- `char_posteriors_summary`
- compressed per-span posterior summary, not full frame dumps
- `uncertainty_score`
- normalized scalar for routing
E. AGP correction block
- `agp_prompt_version`
- `agp_model_id`
- `agp_proposal`
- `agp_confidence`
- `agp_accept_reject`
- `agp_reason`
- `agp_delta_spans`
F. Provenance/search block
- `final_text`
- `provenance_score`
- `sources_used`
- `transliteration_variants`
- `normalized_forms`
- `retrieval_tags`
G. TTS eligibility block
- `tts_eligible`
- `tts_exclusion_reason`
- `overlap_risk`
- `music_risk`
- `single_speaker_clean`
Producer Responsibilities
Vast ASR producer
Must emit:
- identity fields
- acoustic output fields
- partition
- trajectory scalars
- n-best / confusable summaries if available
Primary source:
- `test_predictions.jsonl`
- `test_references.jsonl`
- `test_metrics_by_partition.json`
Corpus builder
Must:
- join Djoko transcriptions, speakers, consensus rows, and later ASR prediction dumps
- preserve raw and corrected text separately
- write canonical corpus rows
Current entry point:
- `asr/build_segment_provenance_corpus.py`
AGP producer
Must append:
- proposal
- accept/reject outcome
- rationale code
- corrected final text
AGP does not overwrite ASR fields. It adds a decision layer.
Search/index producer
Must derive:
- transliteration variants
- normalized forms
- retrieval tags
- search-time embeddings
TTS filter producer
Must derive:
- tts eligibility
- exclusion reason
- overlap/music risk
- single-speaker cleanliness
Partition Policy
`stable`
Use for:
- search by default
- potential TTS candidate pool if speaker and audio quality also pass
`boundary`
Use for:
- AGP correction training
- search with provenance warning
- usually not first-pass TTS training
`uncertain`
Use for:
- AGP hard-case training
- manual review or deferred indexing
- excluded from initial TTS
`novelty`
Use for:
- error analysis
- vocabulary/domain expansion
- not for TTS until independently validated
Rollout Phases
Phase 1: schema lock
Deliverables:
- this plan
- stable canonical field list
- explicit mapping from current artifacts to target packet
Success check:
- no downstream component invents ad hoc field names
Phase 2: ASR dump upgrade
Tasks:
1. Ensure current Vast jobs emit prediction/reference rows in the expected format.
2. Add `partition` and `trajectory_scalars` to every row.
3. Add compact n-best/confusable summaries where feasible.
Deliverables:
- upgraded `test_predictions.jsonl`
- upgraded `test_references.jsonl`
- partition-aware row dumps
Success check:
- a single ASR row is sufficient to reconstruct the correction input
Phase 3: corpus integration
Tasks:
1. Extend `build_segment_provenance_corpus.py` to ingest the upgraded ASR rows.
2. Join them onto Djoko segment rows by stable IDs.
3. Preserve current consensus and speaker joins.
Deliverables:
- enriched `artifacts/corpus/segments.parquet`
Success check:
- one row includes ASR text, partition, speaker data, consensus info, and AGP slots
Phase 4: AGP writeback
Tasks:
1. Define AGP output schema precisely.
2. Append AGP proposals and gate outcomes back into the canonical corpus.
3. Version AGP prompt/model identifiers.
Deliverables:
- AGP-enriched corpus rows
Success check:
- every correction is attributable and reversible
Phase 5: search index
Tasks:
1. Build lexical + metadata retrieval over canonical rows.
2. Add transliteration and normalized-form enrichment.
3. Add reranking with Gemma or another compact judge.
Deliverables:
- first vertical search API over N'Ko speech corpus
Success check:
- query by N'Ko, Latinized Bambara, episode, or speaker returns cited rows
Phase 6: TTS subset extraction
Tasks:
1. Mark TTS-eligible rows.
2. Exclude overlap/music/noisy or low-confidence rows.
3. Build speaker-independent training subset first.
Deliverables:
- `artifacts/corpus/tts_seed_subset.jsonl`
Success check:
- subset rows are high-confidence, single-speaker, and correction-clean
Mapping From Current Artifacts
Already present
- `audio_path`
- `episode_id`
- `segment_id`
- `speaker_id`
- `speaker_cluster_version`
- `asr_text_raw`
- `asr_text_postprocessed`
- `final_text`
- `ctc_confidence`
- `consensus_score`
- `text_quality`
- `char_diversity`
- `sources_used`
- `provenance_score`
Present in handoff spec but not yet wired into corpus builder
- `feat_id`
- `audio_id`
- `split`
- `script`
- `mode`
- `reference_text`
- `cer_edits`
- `reference_chars`
- `trajectory_scalars`
- `partition`
Not yet produced anywhere cleanly
- `top_confusable_spans`
- `n_best_hypotheses`
- `char_posteriors_summary`
- `agp_confidence`
- `agp_delta_spans`
- `transliteration_variants`
- `normalized_forms`
- `retrieval_tags`
- `tts_eligible`
- `tts_exclusion_reason`
- `overlap_risk`
- `music_risk`
Immediate Task List
1. Extend ASR evaluation/export code to emit the row-level fields already defined in the handoff.
2. Upgrade the corpus builder to join those ASR dumps.
3. Define AGP writeback JSONL format.
4. Add a corpus filter that emits:
- search-ready rows
- AGP-training rows
- TTS-seed rows
Validation Gates
Gate 1: row completeness
For a sampled row, verify:
- stable identity
- speaker metadata
- raw and corrected text
- partition
- provenance score
Gate 2: correction auditability
For a corrected row, verify:
- raw ASR text remains preserved
- AGP delta is visible
- accept/reject decision is visible
Gate 3: search readiness
For a search result, verify:
- query returns exact row
- cites speaker/episode/segment
- shows whether text is raw or corrected
Gate 4: TTS readiness
For a TTS candidate row, verify:
- single-speaker
- high-confidence
- low overlap/music risk
- no unresolved AGP ambiguity
Main Risks
1. ASR dump incompleteness
- if the acoustic job does not export row-level details, AGP remains synthetic-heavy
2. Overcorrection
- if Gemma is allowed to rewrite too freely, provenance collapses
3. Weak diarization
- speaker labels are still weak supervision and must remain versioned
4. Premature TTS
- raw Djoko output is not a safe synthesis corpus
Bottom Line
This plan makes the uncertainty packet the backbone of the stack.
Once implemented, the same row can support:
- AGP correction
- provenance search
- speaker atlas growth
- high-precision TTS filtering
That is the right bridge between the current N'Ko ASR system and the broader speech platform you want to build.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-brain-scanner/docs/handoffs/nko_uncertainty_packet_execution_plan_2026-04-28.md
Detected Structure
Evaluation · References · Code Anchors · Architecture