Grand Diomande Research · Full HTML Reader

Modular Voice Control System

A comprehensive, extensible voice control system for DJ control using Gemini Live API with track analysis and intelligent transition recommendations.

Agents That Account for Themselves proposal experiment writeup candidate score 24 .md

Full Public Reader

Modular Voice Control System

A comprehensive, extensible voice control system for DJ control using Gemini Live API with track analysis and intelligent transition recommendations.

Architecture

The system is organized into modular components:

dj_agent/voice_control/
├── __init__.py              # Package exports
├── gemini_listener.py       # Gemini Live API voice recognition
├── command_processor.py     # Command parsing, matching, buffering
├── command_map.py           # Voice command to keyboard shortcut mapping
├── deck_controller.py      # Keyboard execution and deck state
├── chain_executor.py        # Higher-order command execution
├── track_analyzer.py        # On-demand audio analysis (BPM, key, drops)
├── transition_advisor.py   # AI-powered transition recommendations
├── voice_controller.py      # Main orchestrator
└── test_system.py           # Test suite

Features

### Core Functionality
- Real-time Voice Recognition: Uses Gemini Live API for high-accuracy speech recognition
- Command Matching: Fuzzy matching with command buffering for multi-word commands
- Keyboard Execution: Sends keyboard shortcuts to control Serato DJ
- Deck Management: Tracks current deck and manages state

### Advanced Features
- Track Analysis: On-demand audio analysis using librosa
- BPM detection
- Beat grid extraction
- Drop detection (energy spikes)
- Build-up detection
- Section detection (breakdowns, builds)
- Transition Recommendations: AI-powered suggestions using Gemini
- Harmonic mixing (key compatibility)
- Energy matching
- Beat alignment
- Optimal timing suggestions

### Higher-Order Commands
- Play Next: Loads and plays next track
- Continuous Mode: Auto-play with automatic track loading
- Transitions: Smooth transitions between tracks
- Sync & Play: Beat-matched transitions

Usage

Basic Usage

python
from dj_agent.voice_control import VoiceController

config = {
    'analysis_cache_dir': './.track_analysis_cache',
    'transition_announcements': False,
}

controller = VoiceController(
    config=config,
    [sensitive field redacted],  # Or use GEMINI_API_KEY env var
    enable_track_analysis=True,
    enable_transitions=True
)

# Start listening
import asyncio
asyncio.run(controller.start())

Command Line

bash
# Run voice control
python3 dj_agent/run_voice_control_gemini.py

# List all commands
python3 dj_agent/run_voice_control_gemini.py --commands

# Disable track analysis
python3 dj_agent/run_voice_control_gemini.py --no-track-analysis

# Disable transitions
python3 dj_agent/run_voice_control_gemini.py --no-transitions

Testing

bash
# Run test suite
python3 dj_agent/voice_control/test_system.py

Track Analysis Integration

Track analysis is automatically triggered when:
- User navigates library ("next track", "move down")
- User loads tracks ("load left", "load right")

To manually analyze a track:

python
analysis = controller.analyze_track("/path/to/track.mp3")
if analysis:
    print(f"BPM: {analysis.bpm}")
    print(f"Drops: {len(analysis.drops)}")
    print(f"Next drop: {analysis.get_next_drop(0.0)}")

Transition Recommendations

Get AI-powered transition recommendations:

python
recommendation = controller.get_transition_recommendation(
    current_time=120.0,  # 2 minutes into current track
    current_track_path="/path/to/current.mp3",
    next_track_path="/path/to/next.mp3"
)

if recommendation:
    print(f"Strategy: {recommendation.strategy}")
    print(f"Transition in {recommendation.beats_until_transition} beats")
    print(f"Reason: {recommendation.reason}")

Configuration

python
config = {
    # Track analysis cache directory
    'analysis_cache_dir': './.track_analysis_cache',

    # Enable voice announcements for transitions
    'transition_announcements': False,
}

Dependencies

Core:
- `google-genai` - Gemini Live API
- `pyaudio` - Audio input
- `pynput` - Keyboard control

Optional (for track analysis):
- `librosa` - Audio analysis
- `scipy` - Signal processing
- `numpy` - Numerical operations

Command Map

The system includes 328+ voice commands covering:
- Left/Right deck playback
- Cue points (1-5)
- Loops and autoloops
- Samples (1-8)
- Library navigation
- Effects (censor, filter, etc.)
- Higher-order commands (play next, transitions, etc.)

See `command_map.py` for the complete list.

Extending the System

Adding New Commands

Edit `command_map.py`:

python
def build_command_map() -> Dict[str, str]:
    return {
        # ... existing commands ...
        "my new command": "keyboard+shortcut",
    }

Adding Chain Commands

Edit `chain_executor.py`:

python
def get_chain_commands(self, command: str) -> List[Tuple[str, float]]:
    if command == "my chain command":
        return [
            ("key1", 0.1),
            ("key2", 0.3),
        ]
    # ... existing chains ...

Custom Track Analysis

Extend `track_analyzer.py` to add new analysis features:

python
class TrackAnalyzer:
    def analyze_track(self, file_path: str):
        # ... existing analysis ...
        # Add your custom analysis
        custom_feature = self._analyze_custom_feature(y, sr)
        analysis.custom_feature = custom_feature

Testing

Run the test suite:

bash
python3 dj_agent/voice_control/test_system.py

Tests validate:
- Module imports
- Command map building
- Command processing
- Deck controller
- Chain executor
- Track analyzer
- Voice controller structure

Troubleshooting

### Missing Dependencies
- Install with: `pip install -r requirements.txt`
- On macOS, pyaudio may need: `brew install portaudio`

### Logging Module Conflict
- The project has a `logging/` directory that can shadow Python's logging module
- The test script handles this automatically
- If issues persist, ensure Python's built-in logging is imported first

### API Key Issues
- Create `.env` file with: `GEMINI_API_KEY=your-key-here`
- Or set environment variable: `export GEMINI_API_KEY=your-key-here`
- Get API key from: https://ai.google.dev/

Future Enhancements

  • [ ] Key detection integration (keyfinder library)
  • [ ] Disk-based analysis caching
  • [ ] Serato state integration for smarter recommendations
  • [ ] Voice announcements for transition recommendations
  • [ ] Real-time track path detection from Serato
  • [ ] Advanced transition strategies (scratch transitions, etc.)

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/apps/web/cc-studio/docs/dj_agent/voice_control/README.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture