Grand Diomande Research ยท Full HTML Reader

Quick Test Guide - Wav2Vec2 + Rekordbox

**What to expect**: ``` Loading Wav2Vec2 ASR model: facebook/wav2vec2-base-960h โœ… Rekordbox orbiter initialized ๐ŸŽค Wav2Vec2 listener started. Speak now...

Agents That Account for Themselves research note experiment writeup candidate score 24 .md

Full Public Reader

Quick Test Guide - Wav2Vec2 + Rekordbox

1. Prerequisites Check

System Requirements

bash
# From studio/ directory
cd [home]/Desktop/Computational\ Choreography/computational-studio/studio

# Activate venv
source venv/bin/activate

# Check Python packages
python -c "import torch; print('โœ… PyTorch:', torch.__version__)"
python -c "import transformers; print('โœ… Transformers:', transformers.__version__)"
python -c "import pynput; print('โœ… pynput installed')"
python -c "from huggingface_hub import InferenceClient; print('โœ… HF Hub installed')"

Environment Variables

bash
# Check .env file exists
ls -la ../.env

# Should contain:
# HF_TOKEN=your_huggingface_token_here

### Rekordbox Setup
1. Launch Rekordbox in Performance mode (not Export mode)
2. Load tracks to Deck 1 and Deck 2
3. Verify keyboard shortcuts:
- Manually press `Z` โ†’ Deck 1 should play/pause
- Manually press `N` โ†’ Deck 2 should play/pause
- Manually press `7` โ†’ 4-beat loop on Deck 1

2. Quick Test - Ghost Mode

Test the pipeline without actually controlling Rekordbox:

bash
# From studio/ directory
python dj_agent/scripts/run_rekordbox_voice_wav2vec.py

What to expect:

Loading Wav2Vec2 ASR model: facebook/wav2vec2-base-960h
โœ… Rekordbox orbiter initialized
๐ŸŽค Wav2Vec2 listener started. Speak now...

[Speak: "play left"]
๐Ÿ“ ASR (text): "play left"
   ๐Ÿ”Ž Rekordbox: execute 3006 (Z) reason=approved
๐Ÿ‘ป [GHOST] Would execute 3006 (Z) score=1.000
   โฑ Rekordbox latency: 45.2 ms

[Speak: "loop right"]
๐Ÿ“ ASR (text): "loop right"
   ๐Ÿ”Ž Rekordbox: execute 3114 (7 + Shift) reason=approved
๐Ÿ‘ป [GHOST] Would execute 3114 (7 + Shift) score=1.000
   โฑ Rekordbox latency: 52.1 ms

Verify:
- โœ… ASR transcribes your speech correctly
- โœ… Commands map to correct IDs (3006 = Play Left, 3114 = Loop Right)
- โœ… Shortcuts match Rekordbox (Z, 7+Shift, etc.)
- โœ… Latency < 100ms for mapping

3. Live Test - Real Keyboard Control

โš ๏ธ IMPORTANT: This will send real keyboard commands to Rekordbox!

Step 1: Enable Live Mode

Edit `dj_agent/scripts/run_rekordbox_voice_wav2vec.py`:

Find line:

python
bridge_cfg = BridgeConfig(ghost=False)  # Changed from True

Or add command-line flag support (recommended).

Step 2: Test Basic Commands

bash
./START_REKORDBOX_VOICE_WAV2VEC.sh

Test sequence:

1. Play Left
- Say: "play left"
- Expected: Deck 1 plays/pauses
- Verify: Audio output changes, waveform moves

2. Play Right
- Say: "play right"
- Expected: Deck 2 plays/pauses
- Verify: Deck 2 starts playing

3. Sync Left
- Say: "sync left" or "beat sync left"
- Expected: Deck 1 syncs to Deck 2
- Verify: BPMs match, beatgrids align

4. Loop Left
- Say: "loop left"
- Expected: 4-beat loop on Deck 1
- Verify: Loop indicator appears, playback loops

5. Loop Right
- Say: "loop right"
- Expected: 4-beat loop on Deck 2
- Verify: Deck 2 enters loop mode

4. Common Issues & Solutions

Issue: ASR Not Recognizing Speech

Symptoms:

๐Ÿ“ ASR (text): ""
   โš ๏ธ No Rekordbox command match

Solutions:
- โœ… Check microphone input (run `python -m sounddevice`)
- โœ… Speak louder and clearer
- โœ… Reduce background music volume
- โœ… Use noise-cancelling microphone
- โœ… Check sample rate (should be 16kHz for Wav2Vec2)

Issue: Wrong Command Mapped

Symptoms:

๐Ÿ“ ASR (text): "play left"
   ๐Ÿ”Ž Rekordbox: execute 3106 (N) reason=approved  # Wrong! Should be 3006 (Z)

Solutions:
- โœ… Check hard-coded overrides in `run_rekordbox_voice_wav2vec.py:102-126`
- โœ… Verify `Mapping/commands.yaml` has correct IDs
- โœ… Clear any cached embeddings
- โœ… Say "left" or "right" explicitly

Issue: Keyboard Shortcuts Not Working

Symptoms:

   ๐Ÿ”Ž Rekordbox: execute 3006 (Z) reason=approved
[Keyboard sent: Z]
# But Rekordbox doesn't respond

Solutions:
- โœ… Ensure Rekordbox window is focused (active)
- โœ… Check macOS Accessibility permissions:
- System Preferences โ†’ Security & Privacy โ†’ Accessibility
- Add Terminal or Python to allowed apps
- โœ… Test manually: Press `Z` while Rekordbox is focused
- โœ… Verify shortcuts in Rekordbox โ†’ Preferences โ†’ Keyboard
- โœ… Make sure Rekordbox is in Performance mode, not Export

Issue: High Latency

Symptoms:

   โฑ Rekordbox latency: 450.2 ms  # Too slow!

Solutions:
- โœ… Use GPU for Wav2Vec2 (if available)
- โœ… Profile stages:

python
  # In run_rekordbox_voice_wav2vec.py
  t_asr = time.time()
  # ... ASR code ...
  print(f"ASR: {(time.time()-t_asr)*1000:.1f}ms")

  t_embed = time.time()
  # ... Embedding code ...
  print(f"Embed: {(time.time()-t_embed)*1000:.1f}ms")
  • โœ… Cache embeddings for common commands
  • โœ… Reduce top_k in search (currently 5)

5. Integration with My New Rekordbox Bridge

You can optionally use the new `RekordboxBridge` I created for better control:

python
# In run_rekordbox_voice_wav2vec.py, replace the orbiter bridge with:

from dj_agent.core.rekordbox_bridge import RekordboxBridge
import yaml

# Load config
with open('configs/rekordbox.yaml') as f:
    config = yaml.safe_load(f)

# Initialize my bridge
rekordbox_bridge = RekordboxBridge(config['dj']['rekordbox'])

# In your execute logic:
def execute_command(command_id: str):
    # Map command_id to action name
    id_to_action = {
        "3006": "PLAY_A",
        "3106": "PLAY_B",
        "3014": "LOOP_4_A",
        "3114": "LOOP_4_B",
        "3009": "SYNC_A",
        "3109": "SYNC_B",
        # ... add more mappings from commands.yaml
    }

    action_name = id_to_action.get(command_id)
    if not action_name:
        print(f"Unknown command ID: {command_id}")
        return

    # Get keyboard mapping from config
    mapping = config['dj']['rekordbox']['map']
    if action_name not in mapping:
        print(f"No mapping for action: {action_name}")
        return

    # Build message
    action_config = mapping[action_name]
    message = {
        'type': 'keyboard',
        'key': action_config['key'],
        'modifiers': action_config.get('modifiers', [])
    }

    # Send!
    rekordbox_bridge.send(message)

6. Next Steps

Once basic commands work:

1. Test more commands:
- Hot cues: "set hot cue A left deck"
- Effects: "effects left", "echo left"
- Recording: "start recording"

2. Run formal evaluation:

bash
   python dj_agent/scripts/eval_rekordbox_voice_wav2vec.py tests/rekordbox_manifest.jsonl

3. Compare with Gemini:

bash
   # Run Gemini path for comparison
   python dj_agent/scripts/run_rekordbox_voice_gemini.py

4. Tune for your setup:
- Adjust confidence thresholds
- Add custom command aliases
- Fine-tune Wav2Vec2 on your voice (optional)

7. Performance Targets

Good Performance:
- โœ… ASR accuracy: > 85
- โœ… Command accuracy: > 90
- โœ… ASR latency: < 300ms
- โœ… Total latency: < 500ms

Excellent Performance:
- โœ… ASR accuracy: > 95
- โœ… Command accuracy: > 97
- โœ… ASR latency: < 200ms
- โœ… Total latency: < 300ms

8. Troubleshooting Script

Run this to diagnose issues:

bash
# Test audio input
python -c "
import sounddevice as sd
import numpy as np

print('๐ŸŽค Recording 3 seconds...')
audio = sd.rec(int(3 * 16000), samplerate=16000, channels=1, dtype='float32')
sd.wait()
print(f'โœ… Recorded {len(audio)} samples')
print(f'   Max amplitude: {np.abs(audio).max():.3f}')
print('   If max < 0.01, microphone may be too quiet')
"

# Test Wav2Vec2 ASR
python -c "
from dj_agent.voice_control.wav2vec_asr import transcribe
import numpy as np

print('๐Ÿ“ Testing Wav2Vec2 ASR...')
# Test with silence (should return empty or noise)
silence = np.zeros(16000, dtype=np.float32)
result = transcribe(silence, 16000)
print(f'Silence transcribed as: \"{result}\"')
"

# Test Embedding Gemma
python -c "
from dj_agent.voice_control.orbiter.embedding import EmbeddingGemmaProvider

print('๐Ÿง  Testing Embedding Gemma...')
embedder = EmbeddingGemmaProvider()
emb = embedder.embed_text('play left')
print(f'โœ… Embedding shape: {emb.shape}')
print(f'   Values: [{emb[0]:.3f}, {emb[1]:.3f}, ..., {emb[-1]:.3f}]')
"

# Test Rekordbox Index
python -c "
from dj_agent.voice_control.orbiter import RekordboxOrbiter, OrbiterConfig
from dj_agent.voice_control.orbiter.embedding import EmbeddingGemmaProvider
from pathlib import Path

print('๐Ÿ” Testing Rekordbox Index...')
commands_path = Path('Mapping/commands.yaml')
embedder = EmbeddingGemmaProvider()
orbiter = RekordboxOrbiter(OrbiterConfig(commands_path=commands_path), embedder)

emb = embedder.embed_text('play left')
hits = orbiter.index.search(emb, top_k=5)
print(f'โœ… Found {len(hits)} matches:')
for hit in hits:
    print(f'   - {hit.command_id}: {hit.metadata.get(\"shortcut\")} (score={hit.score:.3f})')
"

---

Ready to test? Run:

bash
./START_REKORDBOX_VOICE_WAV2VEC.sh

Good luck! ๐ŸŽค๐ŸŽ›๏ธ

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

projects/Documentation/02-projects/dj-agent/studio/docs/TEST_WAV2VEC_QUICK.md

Detected Structure

Evaluation ยท References ยท Code Anchors ยท Architecture