Gemini Live Voice Control - High Accuracy DJ Commands
This uses **Google's Gemini Live API** for superior speech recognition accuracy compared to standard speech recognition libraries.
Full Public Reader
Gemini Live Voice Control - High Accuracy DJ Commands
This uses Google's Gemini Live API for superior speech recognition accuracy compared to standard speech recognition libraries.
π― Why Gemini Live API?
- Much Better Accuracy: Uses Google's advanced AI models
- Real-time Streaming: Low-latency voice recognition
- Context Understanding: Understands natural speech patterns
- Voice Activity Detection: Built-in VAD to filter out background noise
π Requirements
1. Gemini API Key: Get one from [https://ai.google.dev/](https://ai.google.dev/)
2. Python Dependencies: Install with `pip install -r requirements.txt`
π Quick Start
1. Get Your API Key
Visit [https://ai.google.dev/](https://ai.google.dev/) and create an API key.
2. Create .env File
Create a `.env` file in the project root:
# Copy the example file
cp .env.example .env
# Or create it manually
echo "GEMINI_API_KEY=your-api-key-here" > .envThen edit `.env` and add your actual API key:
GEMINI_API_KEY=your-actual-api-key-here3. Run Voice Control
./START_VOICE_CONTROL_GEMINI.shOr directly:
python3 dj_agent/run_voice_control_gemini.pyNote: The script automatically loads the API key from `.env` file!
π Available Commands
List all commands:
python3 dj_agent/run_voice_control_gemini.py --commandsExample Commands:
Left Deck:
- "play left" / "pause left" / "stop left"
- "cue 1 left" / "cue 2 left" / ... / "cue 8 left"
- "censor left" / "filter left" / "echo left"
- "tempo up left" / "faster left"
Right Deck:
- "play right" / "pause right" / "stop right"
- "cue 1 right" / "cue 2 right" / ... / "cue 4 right"
- "censor right" / "filter right" / "echo right"
- "tempo up right" / "faster right"
Quick Actions:
- "drop it" / "kill it" / "bring it back"
- "go" / "stop" / "next" / "back"
- "zoom in" / "zoom out"
- "record" / "search"
βοΈ Features
### High Accuracy Recognition
- Uses Gemini 2.5 Flash Native Audio model
- Understands natural speech patterns
- Handles variations in pronunciation
### Fuzzy Matching
- If exact match not found, tries fuzzy matching (60
- Example: "play lef" β "play left"
### Command Cooldown
- Prevents duplicate commands from firing rapidly
- 1.5 second cooldown between same command
- Shows countdown: "βΈοΈ Cooldown (0.8s remaining)"
### Real-time Streaming
- Continuous audio streaming to Gemini Live API
- Low latency response
- Automatic voice activity detection
π§ Configuration
API Key Options (Priority Order)
1. `.env` File (Recommended):
# Create .env in project root
GEMINI_API_KEY=your-api-key-here2. Environment Variable:
export GEMINI_API_KEY='your-key'3. Command Line:
python3 dj_agent/run_voice_control_gemini.py --api-key 'your-key'The script checks in this order: command line β environment variable β `.env` file
Audio Settings
Default settings (optimized for Gemini Live API):
- Sample Rate: 16kHz (required by Gemini)
- Channels: Mono
- Format: 16-bit PCM
- Chunk Size: 1024 samples
These are automatically configured - no changes needed!
π Troubleshooting
### ".env file not found" or "Gemini API key required"
- Create a `.env` file in the project root with: `GEMINI_API_KEY=your-key`
- Or set the environment variable: `export GEMINI_API_KEY='your-key'`
- Or pass via command line: `--api-key 'your-key'`
### "Failed to connect to Gemini Live API"
- Check your API key is valid
- Check internet connection
- Verify API key has Live API access enabled
### "Failed to open audio stream"
- macOS: Grant microphone permission to Terminal/Python
- System Preferences β Security & Privacy β Microphone
- Linux: Install `portaudio`: `sudo apt-get install portaudio19-dev`
- Windows: Install PyAudio: `pip install pyaudio`
### "No command matched"
- Speak clearly and naturally
- Use commands from the `--commands` list
- Try variations: "play left" instead of "play the left deck"
### Low Recognition Accuracy
- Speak in a quiet environment
- Position microphone closer
- Speak at normal volume (not too quiet/loud)
- Gemini handles natural speech well - don't over-enunciate
π° Pricing
Gemini Live API has usage-based pricing. Check current rates at:
[https://ai.google.dev/pricing](https://ai.google.dev/pricing)
Note: Real-time voice recognition uses API credits. Monitor your usage!
π Comparison: Standard vs Gemini
| Feature | Standard (`run_voice_control.py`) | Gemini (`run_voice_control_gemini.py`) |
|---|---|---|
| Accuracy | Good | Excellent |
| Cost | Free | Pay-per-use |
| Setup | Simple | Requires API key |
| Latency | ~1-2s | ~500ms-1s |
| Context | Basic | Advanced |
| Background Noise | Sensitive | Better filtering |
π References
- [Gemini Live API Docs](https://ai.google.dev/gemini-api/docs/live)
- [Get API Key](https://ai.google.dev/)
- [Pricing Information](https://ai.google.dev/pricing)
π€ Tips for Best Results
1. Speak Naturally: Gemini understands context - don't over-enunciate
2. Use Short Phrases: "play left" is better than "please play the left deck"
3. Wait for Cooldown: Same command needs 1.5s gap
4. Quiet Environment: Reduces false positives
5. Clear Microphone: Position mic properly for best pickup
---
Enjoy high-accuracy voice control! π§
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
projects/Documentation/02-projects/dj-agent/studio/GEMINI_VOICE_CONTROL_README.md
Detected Structure
Evaluation Β· References Β· Figures Β· Code Anchors