Anticipation Geometry Partition: Row-Level Governance for Script-Native N'Ko ASR Deployment
This paper defines the deployment layer of the \nko{} ASR project: Anticipation Geometry Partition (AGP). AGP is not the acoustic model that produced the archived 20.57\% CER anchor. It begins after ASR. Its role is to convert trajectory and uncertainty signals into row-level decisions about correction, provenance, corpus admission, and deployment eligibility. The motivation is simple: a scalar CER number is not enough to build a trustworthy transcript corpus or a production speech system. A model can make local mi
Full Public Reader
Abstract
This paper defines the deployment layer of the ASR project: Anticipation
Geometry Partition (AGP). AGP is not the acoustic model that produced the archived
20.57\
uncertainty signals into row-level decisions about correction, provenance, corpus
admission, and deployment eligibility. The motivation is simple: a scalar CER
number is not enough to build a trustworthy transcript corpus or a production
speech system. A model can make local mistakes, a language model can propose fluent
but false corrections, and out-of-domain data can contain music, overlapping
speakers, dialect drift, visual context, and uncertain references.
AGP partitions transcript spans into stable, boundary, uncertain, and novelty
states. Stable spans should usually remain unchanged. Boundary spans may admit
local repairs when acoustic and textual evidence agree. Uncertain spans require
stronger evidence or abstention. Novelty spans should be treated as data-discovery
events rather than as opportunities for language-model rewriting. The formal unit is
a row containing ASR hypothesis, reference when available, edit counts, trajectory
summaries, partition labels, candidate correction, admissibility decision,
provenance metadata, and deployment gates.
The paper also consolidates the deployment evidence that motivated AGP. Historical
compositional generalization and vocabulary expansion experiments suggest that
degrades less than Latin on unseen-word utterances and retains a smaller
residual gap after full-data training. Djoko soap-opera extraction created a much
harder deployment substrate: 1,124 downloaded videos out of a 2,001-video channel,
32,826 audio segments, an initial 8,985-segment batch, 269 high-confidence
consensus rows at a strict threshold, 6,625 speaker-labeled rows, 258 episodes,
seven speakers, and five eligible speakers. The domain gap was severe; the first
batch produced only about 8.8\
The conclusion is that AGP is not optional polish. It is the governance structure
needed before script-native ASR output can safely become search data, correction
data, subtitle candidates, or TTS training material.
Introduction
The first three papers in this series establish the representation, metric, and ASR
anchor. This final paper asks what happens after the model emits text. That question
matters because ASR deployment is not simply a sequence of benchmark runs. A
deployed system ingests new speakers, new domains, background music, overlapping
dialogue, dialect variation, channel noise, spelling variation, and missing
references. CTC-style ASR [citation: graves2006connectionist] and frozen Whisper
features [citation: radford2023robust] can produce useful transcripts, but a scalar CER
cannot decide whether a particular row should enter a corpus, be corrected by a
language model, be routed to human review, or be excluded from TTS training.
AGP, the Anticipation Geometry Partition, is the project response. It turns the
trajectory geometry introduced in the ASR paper into a post-ASR governance layer.
The acoustic model produces a raw transcript and trajectory summaries. AGP assigns
states to rows and spans, evaluates proposed corrections, records provenance, and
decides what the row is safe for. It is deliberately conservative: improving
surface fluency is not enough. The correction must be admissible under local
evidence.
The paper has two goals. The first is formal: define AGP states, row contracts, and
admissibility. The second is practical: document the deployment substrates and
historical robustness experiments that made AGP necessary. These include unseen
vocabulary experiments, vocabulary expansion, and Djoko out-of-domain extraction.
None of these should be confused with the archived 20.57\
different question: how does a script-native ASR project move from a benchmark
number to a governed corpus-building system?
Research Questions and System Boundary
AGP is evaluated as a governance layer, not as an acoustic model. Its research
questions are therefore about decision quality, provenance, correction safety, and
deployment eligibility. A paper about AGP must not be scored only by asking whether
the final transcript looks more fluent. Fluency can hide unsupported rewrites. AGP
is successful when it admits improvements, rejects or abstains from risky changes,
and preserves enough evidence for later review.
Caption: Research questions for AGP.
| ID | Question | Evidence required |
|---|---|---|
| RQ1 | Can trajectory summaries partition rows into useful uncertainty states? | Per-state row counts, CER deltas, and accepted-regression rates. |
| RQ2 | Can correction be made conservative rather than merely fluent? | Accepted, rejected, and abstained edits with improvement/regression labels. |
| RQ3 | Can out-of-domain data be converted into inspectable review material? | Source metadata, speaker metadata, confidence gates, and exclusion reasons. |
| RQ4 | Can corpus and TTS eligibility be separated from search eligibility? | Deployment gates that distinguish review, search, correction, TTS, and exclusion. |
Caption: System boundary: what AGP is and is not.
| AGP is | AGP is not | |
|---|---|---|
| A post-ASR row-governance layer. | The acoustic model that produced 20.57 A correction-admissibility policy. | A guarantee that all accepted text is perfect. |
| A provenance and deployment gate. | A replacement for human orthographic authority. | |
| A way to preserve uncertain evidence. | A license to train on every extracted row. |
From Trajectory Geometry to Governance
Caption: Operational AGP states.
| State | Governance interpretation |
|---|---|
| Stable | High commitment and stability, low uncertainty and transition pressure. Default action: preserve the transcript. |
| Boundary | Elevated transition pressure near a likely phoneme, syllable, word, or phrase boundary. Default action: allow constrained local repair. |
| Uncertain | High ambiguity or low confidence without a clear boundary explanation. Default action: require stronger evidence or abstain. |
| Novelty | Possible unseen word, speaker shift, domain shift, code-switch, or out-of-training-distribution pattern. Default action: preserve provenance and route to review or data discovery. |
AGP's central design choice is that novelty is not treated as error by default. In
low-resource ASR, novelty may be the most valuable part of the row: a new word, a
speaker-specific pronunciation, or a domain-specific expression. A generic
language-model correction layer might normalize it away. AGP should instead retain
the evidence and mark the risk.
Pipeline Formalization
Caption: AGP row lifecycle.
| Stage | Primary question | Output |
|---|---|---|
| ASR | What did the acoustic model emit? | Raw hypothesis and confidence signals. |
| Partition | What kind of uncertainty is present? | Stable, boundary, uncertain, or novelty state. |
| Correction | Is there a local candidate change? | Candidate string and changed-span metadata. |
| Admissibility | Is the candidate defensible? | Accept, reject, or abstain with reason code. |
| Deployment gate | What may this row be used for? | Search, review, correction, TTS, or exclusion status. |
Row Contract
The formal unit is not a paragraph or a final transcript. It is a row. Each row
should be inspectable from acoustic source to final decision.
longtable{p{0.24\linewidth}p{0.62\linewidth}}
Caption: AGP row contract.
Block & Representative fields
\endfirsthead
Block & Representative fields
\endhead
Identity & row id, feature id, audio source id, segment start/end, split, script,
mode, model id.
ASR evidence & raw hypothesis, reference when available, edit count, reference
character count, character denominator, posterior summary.
Trajectory evidence & commitment, uncertainty, transition pressure, recovery
margin, phase stiffness, novelty, stability, derived AGP state.
Correction candidate & proposed correction, source of proposal, changed spans,
local confidence, prompt or rule version.
Admissibility & accept, reject, or abstain; reason code; partition state; evidence
thresholds; regression flag when reference exists.
Provenance & retrieval sources, transliteration variants, normalized forms,
speaker metadata, episode metadata, provenance score.
Deployment gate & search eligibility, corpus-training eligibility, TTS eligibility,
overlap risk, music risk, speaker cleanliness, exclusion reason.
longtable
This contract prevents the common failure mode in which a correction layer silently
rewrites a transcript and only the final string is preserved. For scientific use,
the row must show what changed, why, and under which evidence.
Admissibility
Caption: Admissibility policy by AGP state.
| State | Default decision | Accept only if |
|---|---|---|
| Stable | Reject or preserve | Correction fixes a clear normalization artifact and does not alter acoustic content. |
| Boundary | Consider local accept | Candidate is small, local, and supported by boundary evidence or reference-aligned confusion patterns. |
| Uncertain | Abstain | Multiple independent signals converge and no high-risk rewrite is introduced. |
| Novelty | Abstain or route to review | Candidate is provenance-backed and does not erase a possible new lexical item or speaker feature. |
The metric that matters most for AGP safety is not total accepted edits. It is
accepted regressions. A correction system that improves some rows while frequently
accepting worse edits is not safe for corpus construction. A full AGP benchmark
should report proposed edits, accepted edits, rejected edits, abstentions, accepted
improvements, accepted neutral edits, accepted regressions, rejected improvements,
and per-partition CER deltas.
Partition Scoring
AGP should be evaluated with a decision matrix rather than a single aggregate CER.
For row $i$, define $\Delta_i=e_i-e_i'$, where positive values mean the candidate
improved the row. An accepted correction is an accepted improvement if
$\Delta_i>0$, an accepted neutral edit if $\Delta_i=0$, and an
accepted regression if $\Delta_i<0$. Rejected candidates can be labeled the
same way when a reference is available, which exposes whether the gate is too
conservative. Rows without references should still carry provenance and partition
labels but should not be counted as metric improvements.
Caption: AGP benchmark accounting.
| Quantity | Interpretation |
|---|---|
| Proposed edits | How often the correction layer tries to change ASR output. |
| Accepted edits | How often AGP permits a change. |
| Accepted improvements | Corrections that reduce edit distance when references exist. |
| Accepted regressions | Corrections that make the row worse; primary safety failure. |
| Rejected improvements | Useful edits blocked by the gate; primary conservatism cost. |
| Abstentions | Cases where evidence is insufficient for a safe decision. |
| Per-state CER delta | Whether stable, boundary, uncertain, and novelty rows behave differently. |
Smoke Tests
The current AGP tests are preliminary but useful for defining expected behavior.
They should not be presented as a final benchmark over the archived 20.57\
set.
Caption: AGP smoke-test outcomes. These tests evaluate conservative correction behavior, not ASR model quality.
| Test | CER movement | Correction behavior |
|---|---|---|
| Hand smoke test | 14.29 accepted edits. | |
| Synthetic stress test | 13.33 accepted cases. | |
| Archived real slice | 76.04 accepted cases. |
The important pattern is conservatism. AGP is allowed to abstain. In fact, it should
abstain often in uncertain or novelty regions. Its purpose is not to maximize the
number of edited rows; its purpose is to admit only defensible changes.
Correction Benchmark Design
A full benchmark should contain at least four partitions. The first is a stable
clean-read partition where corrections should be rare. The second is a boundary
partition containing likely local segmentation, elision, or character-boundary
errors. The third is an uncertain partition with low confidence, overlapping
speech, or noisy acoustics. The fourth is a novelty partition containing unseen
names, domain-specific words, dialectal variants, or speaker-specific forms. Each
partition should be scored separately because the safe action differs by state.
Caption: Target benchmark design for AGP correction.
| Partition | Expected gate behavior | Required report |
|---|---|---|
| Stable | Preserve most rows; accept only normalization repairs. | False rewrite rate and accepted-regression rate. |
| Boundary | Permit small local fixes. | Local edit precision and CER delta. |
| Uncertain | Abstain unless evidence converges. | Abstention rate and unsafe-accept rate. |
| Novelty | Route to review or data discovery. | Novelty preservation and human-review yield. |
Compositional Generalization
The deployment story also includes historical robustness experiments. In a
compositional generalization setup, models trained on seen vocabulary were evaluated
on utterances containing unseen words. The hypothesis was that should degrade
less because unseen words can be composed from known phoneme-character units.
Caption: Historical compositional-generalization evidence.
| Script | Seen-word CER | Unseen-word CER | Gap |
|---|---|---|---|
| 16.09 Latin | 15.05 |
[figure: figures/fig5_expF_compositional.pdf]
\caption{Experiment F compositional-generalization figure. The result is used as
deployment motivation, not as the canonical 20.57\
has a 3.65 percentage-point smaller generalization gap in this historical
experiment. The result is consistent with the script-advantage hypothesis: a
transparent script makes unseen words more compositional. It is not a substitute
for a future matched evaluation.
Vocabulary Expansion
The vocabulary-expansion experiment asks whether full-data training recovers
unseen-word performance and whether any script advantage remains.
Caption: Historical vocabulary-expansion evidence.
| Condition | CER | Latin CER | Difference | |||
|---|---|---|---|---|---|---|
| Seen-only model on seen words | 16.09 Seen-only model on unseen words | 53.90 Full-data model on unseen words | 40.15 Recovery from full data | 13.75pp | 13.78pp | Nearly identical |
[figure: figures/fig6_expH_vocab_expansion.pdf]
\caption{Experiment H vocabulary-expansion figure. Full data helps both scripts,
but the residual unseen-word gap remains smaller for in the recorded
experiment.}
This result matters for deployment because vocabulary expansion is a corpus-building
problem. A system should improve as new rows become trustworthy. AGP provides the
filtering mechanism for deciding which rows are trustworthy enough to enter future
training or review loops.
Djoko Domain Transfer
The Djoko deployment effort is the harshest test environment in the project. The
source is a Bambara-language soap-opera domain rather than clean read speech. The
data include music, overlapping speech, varying speakers, likely dialect variation,
episode structure, and no reliable subtitle track. It is not the same distribution
as the 20.57\
The extraction record includes 1,124 downloaded videos from a 2,001-video channel
and 32,826 thirty-second audio segments. A first batch included 8,985 segments and
8,985 ASR transcriptions. Consensus filtering at a strict threshold produced 269
high-confidence rows; speaker and episode processing produced 6,625 speaker-labeled
rows, 258 episodes, seven speakers, and five eligible speakers. The first batch
had only about 8.8\
severe domain gap.
Caption: Djoko deployment substrate.
| Artifact | Count |
|---|---|
| Channel videos available | 2,001 |
| Videos downloaded | 1,124 |
| Audio segments extracted | 32,826 |
| First-batch segments/transcriptions | 8,985 |
| Strict high-confidence consensus rows | 269 |
| Speaker-labeled rows | 6,625 |
| Episodes represented | 258 |
| Speakers detected in speaker table | 7 |
| Eligible speakers under gate | 5 |
| Approximate meaningful-output rate in first batch | 8.8 |
[figure: figures/fig7_djoko_quality.pdf]
Caption: Djoko quality-control figure. The figure illustrates why row-level filtering is necessary before using out-of-domain ASR output as training data or subtitle material.
The practical conclusion is that deployment needs gates. A row can be useful for
search, candidate review, correction training, TTS training, or none of these. The
decision depends on evidence, not on the existence of an ASR string.
Failure Mode Taxonomy
The deployment data expose failure modes that are invisible in clean benchmark
summaries. Music can create confident nonsense. Overlapping speakers can produce a
hybrid transcript that should not be assigned to either speaker. Dialect drift can
look like an error to a model trained on a narrower reference distribution.
Code-switching can make a purely script-native gate overconfident. Episode metadata
can leak duplicates into training and evaluation if not handled carefully.
Caption: Deployment failure modes and AGP response.
| Failure mode | Risk | AGP response |
|---|---|---|
| Background music | ASR may hallucinate speech-like text. | Music-risk flag and TTS exclusion. |
| Overlapping speakers | Transcript cannot be assigned cleanly. | Speaker-overlap flag and review-only gate. |
| Dialect or register shift | Valid speech may be misclassified as error. | Novelty state and community review. |
| Named entities | Language model may normalize away new lexical items. | Preserve raw hypothesis and require provenance. |
| Duplicate episodes | Train/test leakage or inflated coverage. | Episode-level provenance and deduplication. |
| Low-confidence consensus | False agreement across weak models. | Threshold review and abstention by default. |
AGP and TTS Eligibility
Human Review and Community Authority
AGP is a machine gate, not an orthographic authority. Human review remains necessary
for tone policy, dialectal variants, register, names, religious or formulaic
phrases, and acceptable spelling variation. The row contract is designed so a
reviewer can see the raw audio source, raw ASR hypothesis, candidate correction,
partition state, and reason code. That makes review cheaper and more accountable:
the reviewer is not asked to trust the model, only to adjudicate a visible
evidence bundle.
Data Lifecycle
Rows should move through a staged lifecycle. A raw row may enter search if it is
useful for retrieval and clearly marked as machine-generated. A reviewed row may
enter the correction set if its evidence and human decision are preserved. A cleaner
single-speaker row may enter TTS only after stricter audio and speaker gates. Rows
with music, overlap, weak provenance, or novelty uncertainty should remain in
review or exclusion states until additional evidence is available. This staged
lifecycle lets the project stop spending compute while still preserving the value
of extracted data for later review, correction, and deployment work.
Limitations
AGP is a downstream governance architecture with smoke tests and deployment
substrates. It should be evaluated separately from the 20.57\
ASR number measures decoding quality, while AGP measures whether transcript rows
can be safely searched, reviewed, corrected, retained, or excluded. A complete AGP
evaluation would run on the prediction/reference row set and report accepted
improvements, accepted regressions, rejected improvements, and abstentions by
partition.
The historical ExpF and ExpH results are useful but not a substitute for a new
matched evaluation. They should be presented as deployment motivation. The Djoko data are
also not ground-truth CER evidence; they are out-of-domain corpus-building evidence.
The severe 8.8\
defines the real deployment difficulty.
Finally, AGP needs community review. A conservative correction gate can prevent
many machine-learning errors, but orthographic authority, tone policy, punctuation,
register, and acceptable spelling variation require Manding and expertise,
not only model confidence. This is especially important because is a living
scholarly and community script, not merely a model output alphabet
[citation: donaldson2017clear,unicode2006nko].
Conclusion
AGP is the bridge from benchmark ASR to governed deployment. It does not produce
the archived 20.57\
is downstream: preserve row-level evidence, partition uncertainty, constrain
correction, and decide which outputs can safely become corpus material.
The broader conclusion of the final paper series is that script-native ASR is an
infrastructure problem. Representation, metrics, decoding, correction, and
deployment gates all interact. Without AGP, a project can obtain a promising CER
number and still build an unsafe corpus. With AGP, the project has a path to stop
spending compute now while preserving the exact evidence needed for later
correction, review, and deployment.
plainnat
references
Promotion Decision
Compile/render the source, verify references and figures, then add to the curated atlas.
Source Anchor
nko-brain-scanner/paper/final/04-agp-deployment/paper.tex
Detected Structure
Latex · Abstract · Method · Evaluation · References · Math · Figures · Architecture