Back to corpus
working paperpreprint render candidatescore 100

Anticipation Geometry Partition: Row-Level Governance for Script-Native N'Ko ASR Deployment

This paper defines the deployment layer of the \nko{} ASR project: Anticipation Geometry Partition (AGP). AGP is not the acoustic model that produced the archived 20.57\% CER anchor. It begins after ASR. Its role is to convert trajectory and uncertainty signals into row-level decisions about correction, provenance, corpus admission, and deployment eligibility. The motivation is simple: a scalar CER number is not enough to build a trustworthy transcript corpus or a production speech system. A model can make local mi

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

This paper defines the deployment layer of the \nko{} ASR project: Anticipation Geometry Partition (AGP). AGP is not the acoustic model that produced the archived 20.57\% CER anchor. It begins after ASR. Its role is to convert trajectory and uncertainty signals into row-level decisions about correction, provenance, corpus admission, and deployment eligibility. The motivation is simple: a scalar CER number is not enough to build a trustworthy transcript corpus or a production speech system. A model can make local mistakes, a language model can propose fluent but false corrections, and out-of-domain data can contain music, overlapping speakers, dialect drift, visual context, and uncertain references. AGP partitions transcript spans into stable, boundary, uncertain, and novelty states. Stable spans should usually remain unchanged. Boundary spans may admit local repairs when acoustic and textual evidence agree. Uncertain spans require stronger evidence or abstention. Novelty spans should be treated as data-discovery events rather than as opportunities for language-model rewriting. The formal unit is a row containing ASR hypothesis, reference when available, edit counts, trajectory summaries, partition labels, candidate correction, admissibility decision, provenance metadata, and deployment gates. The paper also consolidates the deployment evidence that motivated AGP. Historical compositional generalization and vocabulary expansion experiments suggest that \nko{} degrades less than Latin on unseen-word utterances and retains a smaller residual gap after full-data training. Djoko soap-opera extraction created a much harder deployment substrate: 1,124 downloaded videos out of a 2,001-video channel, 32,826 audio segments, an initial 8,985-segment batch, 269 high-confidence consensus rows at a strict threshold, 6,625 speaker-labeled rows, 258 episodes, seven speakers, and five eligible speakers. The domain gap was severe; the first batch produced only about 8.8\% meaningful output under a clean-read-speech model. The conclusion is that AGP is not optional polish. It is the governance structure needed before script-native ASR output can safely become search data, correction data, subtitle candidates, or TTS training material.

Promotion decision

What has to happen next

Compile/render the source, verify references and figures, then add to the curated atlas.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.