Mohamed Diomande

Full HTML reader

Read the full artifact

Extracted abstract or opening context

Train Wav2Vec2 to recognize **your voice** saying DJ commands with >95% accuracy while maintaining <100ms latency. **Why this approach?** - ✅ **Speed**: <100ms total latency (acceptable for DJing) - ✅ **Accuracy**: Fine-tuned ASR + semantic retrieval = best of both worlds - ✅ **Flexibility**: Can add new commands without retraining audio model - ✅ **Debugging**: Can see transcribed text - ✅ **Practical**: Uses pre-trained models with fine-tuning **vs Pure S2O (Audio→Command)**: - Faster (~20ms) but: - Requires large audio-command dataset - Less flexible (fixed command vocabulary) - Harder to debug - More complex training pipeline **What you'll do**: 1. UI shows a command (e.g., "play left") 2. Click "RECORD" button 3. Speak the command clearly after countdown 4. Repeat 3 times per command (for robustness) 5. ~40 commands × 3 variations = 120 recordings **Tips**: - Speak naturally (as you would while DJing) - Use consistent pronunciation - Record in similar environment to where you'll DJ - Take breaks every 20 commands

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.