Modular Voice Control System
A comprehensive, extensible voice control system for DJ control using Gemini Live API with track analysis and intelligent transition recommendations.
Full Public Reader
Modular Voice Control System
A comprehensive, extensible voice control system for DJ control using Gemini Live API with track analysis and intelligent transition recommendations.
Architecture
The system is organized into modular components:
dj_agent/voice_control/
├── __init__.py # Package exports
├── gemini_listener.py # Gemini Live API voice recognition
├── command_processor.py # Command parsing, matching, buffering
├── command_map.py # Voice command to keyboard shortcut mapping
├── deck_controller.py # Keyboard execution and deck state
├── chain_executor.py # Higher-order command execution
├── track_analyzer.py # On-demand audio analysis (BPM, key, drops)
├── transition_advisor.py # AI-powered transition recommendations
├── voice_controller.py # Main orchestrator
└── test_system.py # Test suiteFeatures
### Core Functionality
- Real-time Voice Recognition: Uses Gemini Live API for high-accuracy speech recognition
- Command Matching: Fuzzy matching with command buffering for multi-word commands
- Keyboard Execution: Sends keyboard shortcuts to control Serato DJ
- Deck Management: Tracks current deck and manages state
### Advanced Features
- Track Analysis: On-demand audio analysis using librosa
- BPM detection
- Beat grid extraction
- Drop detection (energy spikes)
- Build-up detection
- Section detection (breakdowns, builds)
- Transition Recommendations: AI-powered suggestions using Gemini
- Harmonic mixing (key compatibility)
- Energy matching
- Beat alignment
- Optimal timing suggestions
### Higher-Order Commands
- Play Next: Loads and plays next track
- Continuous Mode: Auto-play with automatic track loading
- Transitions: Smooth transitions between tracks
- Sync & Play: Beat-matched transitions
Usage
Basic Usage
from dj_agent.voice_control import VoiceController
config = {
'analysis_cache_dir': './.track_analysis_cache',
'transition_announcements': False,
}
controller = VoiceController(
config=config,
[sensitive field redacted], # Or use GEMINI_API_KEY env var
enable_track_analysis=True,
enable_transitions=True
)
# Start listening
import asyncio
asyncio.run(controller.start())Command Line
# Run voice control
python3 dj_agent/run_voice_control_gemini.py
# List all commands
python3 dj_agent/run_voice_control_gemini.py --commands
# Disable track analysis
python3 dj_agent/run_voice_control_gemini.py --no-track-analysis
# Disable transitions
python3 dj_agent/run_voice_control_gemini.py --no-transitionsTesting
# Run test suite
python3 dj_agent/voice_control/test_system.pyTrack Analysis Integration
Track analysis is automatically triggered when:
- User navigates library ("next track", "move down")
- User loads tracks ("load left", "load right")
To manually analyze a track:
analysis = controller.analyze_track("/path/to/track.mp3")
if analysis:
print(f"BPM: {analysis.bpm}")
print(f"Drops: {len(analysis.drops)}")
print(f"Next drop: {analysis.get_next_drop(0.0)}")Transition Recommendations
Get AI-powered transition recommendations:
recommendation = controller.get_transition_recommendation(
current_time=120.0, # 2 minutes into current track
current_track_path="/path/to/current.mp3",
next_track_path="/path/to/next.mp3"
)
if recommendation:
print(f"Strategy: {recommendation.strategy}")
print(f"Transition in {recommendation.beats_until_transition} beats")
print(f"Reason: {recommendation.reason}")Configuration
config = {
# Track analysis cache directory
'analysis_cache_dir': './.track_analysis_cache',
# Enable voice announcements for transitions
'transition_announcements': False,
}Dependencies
Core:
- `google-genai` - Gemini Live API
- `pyaudio` - Audio input
- `pynput` - Keyboard control
Optional (for track analysis):
- `librosa` - Audio analysis
- `scipy` - Signal processing
- `numpy` - Numerical operations
Command Map
The system includes 328+ voice commands covering:
- Left/Right deck playback
- Cue points (1-5)
- Loops and autoloops
- Samples (1-8)
- Library navigation
- Effects (censor, filter, etc.)
- Higher-order commands (play next, transitions, etc.)
See `command_map.py` for the complete list.
Extending the System
Adding New Commands
Edit `command_map.py`:
def build_command_map() -> Dict[str, str]:
return {
# ... existing commands ...
"my new command": "keyboard+shortcut",
}Adding Chain Commands
Edit `chain_executor.py`:
def get_chain_commands(self, command: str) -> List[Tuple[str, float]]:
if command == "my chain command":
return [
("key1", 0.1),
("key2", 0.3),
]
# ... existing chains ...Custom Track Analysis
Extend `track_analyzer.py` to add new analysis features:
class TrackAnalyzer:
def analyze_track(self, file_path: str):
# ... existing analysis ...
# Add your custom analysis
custom_feature = self._analyze_custom_feature(y, sr)
analysis.custom_feature = custom_featureTesting
Run the test suite:
python3 dj_agent/voice_control/test_system.pyTests validate:
- Module imports
- Command map building
- Command processing
- Deck controller
- Chain executor
- Track analyzer
- Voice controller structure
Troubleshooting
### Missing Dependencies
- Install with: `pip install -r requirements.txt`
- On macOS, pyaudio may need: `brew install portaudio`
### Logging Module Conflict
- The project has a `logging/` directory that can shadow Python's logging module
- The test script handles this automatically
- If issues persist, ensure Python's built-in logging is imported first
### API Key Issues
- Create `.env` file with: `GEMINI_API_KEY=your-key-here`
- Or set environment variable: `export GEMINI_API_KEY=your-key-here`
- Get API key from: https://ai.google.dev/
Future Enhancements
- [ ] Key detection integration (keyfinder library)
- [ ] Disk-based analysis caching
- [ ] Serato state integration for smarter recommendations
- [ ] Voice announcements for transition recommendations
- [ ] Real-time track path detection from Serato
- [ ] Advanced transition strategies (scratch transitions, etc.)
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/apps/web/cc-studio/docs/dj_agent/voice_control/README.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture