Back to corpus
research noteexperiment writeup candidatescore 22
How Seven Numbers Changed Everything We Know About Speech Recognition
We had a working Bambara ASR system. A 46.9M-parameter Transformer CTC decoder sitting on top of frozen Whisper features. It took raw audio, ran it through Whisper's encoder to get acoustic features, then decoded those features into N'Ko characters.
Full HTML reader
Read the full artifact
Extracted abstract or opening context
We had a working Bambara ASR system. A 46.9M-parameter Transformer CTC decoder sitting on top of frozen Whisper features. It took raw audio, ran it through Whisper's encoder to get acoustic features, then decoded those features into N'Ko characters.
But there was a pattern we kept noticing in the errors. The model would get the middle of words right and the beginnings wrong. It would handle sustained vowels perfectly but stumble on consonant clusters at phrase boundaries. It would decode long stretches of speech with no errors, then suddenly produce a burst of garbage at exactly the moments where a human listener would say "oh, something changed there."
The model had no sense of where it was in the utterance. It processed each frame independently, using only local attention to the surrounding frames. It had no concept of "we are at the beginning of a new phrase" or "the speaker is about to transition to a different topic." It was reading the audio like a very fast typewriter, one frame at a time, with no awareness of the larger structure.
In our motion capture work (a separate project involving dance, music, and real-time audio generation), we had developed something called anticipation geometry. The idea: you can predict what a dancer is about to do by tracking seven scalar values derived from their movement trajectory.
Those seven values are: 1. **Commitment** (0-1): how locked-in the current motion path is 2. **Uncertainty** (0-1): how many possible next-states exist 3. **Transition pressure** (0-1): how close we are to a regime change 4. **Rhythmic phase** (0-2pi): where we are in the current periodic cycle 5. **Energy** (0-1): overall intensity of the signal 6. **Curvature** (0-1): how rapidly the trajectory is bending 7. **Jerk** (0-1): rate of change of curvature, the "snap" in the motion
Promotion decision
What has to happen next
Attach run IDs, datasets, metrics, and reproduction commands.
Why this is not always a full paper yet
Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.