Improving Wav2Vec2 ASR Accuracy for DJ Commands

Full HTML reader

Read the full artifact

Extracted abstract or opening context

Wav2Vec2 is misrecognizing DJ commands: - "play left" → "hey laughed", "they left", "lay left" - Short, specific phrases are hard for general ASR models **Pros**: Much better accuracy, still fast **Cons**: Requires additional library **Pros**: Best accuracy, learns your voice **Cons**: Takes time to record and train **Pros**: Best balance of accuracy and speed **Cons**: Requires phonetic mapping for each language **Pros**: Much better accuracy out-of-box **Cons**: Slower (100-300ms vs 60ms), larger model

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.