Grand Diomande Research · Full HTML Reader

Quick Start Guide - Voice Control System

The voice control system now includes an **Auto DJ** feature that automatically mixes tracks with intelligent transitions and effects!

Agents That Account for Themselves proposal experiment writeup candidate score 32 .md

Full Public Reader

Quick Start Guide - Voice Control System

Auto DJ Feature

The voice control system now includes an Auto DJ feature that automatically mixes tracks with intelligent transitions and effects!

Quick Start with Auto DJ

1. Start Voice Control:

bash

   python3 dj_agent/run_voice_control_gemini.py

2. Add Tracks (programmatically or via Serato):
- Tracks are automatically analyzed when added
- Analysis includes BPM, key, energy levels, drops, and breakdowns

3. Start Auto DJ:
Say: "start auto dj"

4. Control Auto DJ:
- "stop auto dj" - Stop automatic mixing
- "pause auto dj" / "resume auto dj" - Pause/resume
- "skip track" - Skip to next track
- "set auto dj mode harmonic" - Change mixing strategy
- "show queue" - View current queue
- "clear queue" - Clear all tracks

Mixing Strategies

harmonic: Key-based mixing (Camelot wheel)
bpm: BPM matching
energy: Energy level matching
composite: Balanced combination (default)
random: Random selection

See `dj_agent/AUTO_DJ_README.md` for complete documentation.

---

Quick Start Guide - Voice Control System

Prerequisites

1. Install Dependencies

bash

   pip install -r requirements.txt

2. Get Gemini API Key
- Visit: https://ai.google.dev/
- Create an API key
- Add it to `.env` file:

bash

     echo "GEMINI_API_KEY=your-api-key-here" > .env

Starting the Program

Method 1: Using the Shell Script (Easiest)

bash

# From project root
./START_VOICE_CONTROL_GEMINI.sh

This script:
- Checks for `.env` file
- Starts voice control automatically
- Handles API key loading

Method 2: Direct Python Command

bash

# Basic start
python3 dj_agent/run_voice_control_gemini.py

# With API key as argument
python3 dj_agent/run_voice_control_gemini.py --api-key your-api-key-here

# Disable track analysis (faster startup)
python3 dj_agent/run_voice_control_gemini.py --no-track-analysis

# Disable transition recommendations
python3 dj_agent/run_voice_control_gemini.py --no-transitions

# Both disabled (minimal mode)
python3 dj_agent/run_voice_control_gemini.py --no-track-analysis --no-transitions

Method 3: List Available Commands

bash

# See all 328+ voice commands
python3 dj_agent/run_voice_control_gemini.py --commands

Usage Examples

Basic Voice Control

1. Start the program:

bash

   ./START_VOICE_CONTROL_GEMINI.sh

2. Speak commands:
- "play left" - Plays left deck
- "play right" - Plays right deck
- "cue 1 left" - Jump to cue 1 on left deck
- "play next" - Load and play next track
- "continuous mode" - Enable auto-play mode

3. Stop: Press `Ctrl+C`

With Track Analysis

Track analysis runs automatically when enabled. It analyzes tracks when you:
- Navigate library ("next track", "move down")
- Load tracks ("load left", "load right")

Note: Full track path detection requires Serato integration. Currently, the hooks are in place but need Serato to provide actual track paths.

With Transition Recommendations

When enabled, the system can suggest optimal transition points:
- Harmonic mixing (key compatibility)
- Energy matching
- Beat alignment

Note: Requires both tracks to be analyzed first.

Command Line Options

--api-key KEY          Gemini API key (or use .env file)
--commands             List all available voice commands
--no-track-analysis    Disable track analysis features
--no-transitions       Disable transition recommendations

Example Session

bash

# 1. Start voice control
$ ./START_VOICE_CONTROL_GEMINI.sh

# Output:
# 🚀 Starting Gemini Live Voice Control...
# ✓ Gemini Live API initialized
# ✓ Voice controller initialized with 328 commands
# ✓ Track analysis enabled
# ✓ Transition advisor enabled
#
# 🎤 GEMINI LIVE VOICE CONTROL - HIGH ACCURACY MODE
# ======================================================================
#
# ⚙️  Settings:
#    Model: gemini-2.0-flash-exp
#    Sample rate: 16000Hz
#    Cooldown: 1.5s
#
# ✓ Connecting to Gemini Live API...
# ✓ Connected to Gemini Live API
# ✓ Audio streaming started
#
# 🎤 Listening for voice commands...

# 2. Speak commands:
# You: "play left"
# System: ✓ "play left" → Pressed: w

# You: "play next right"
# System: ✓ "play next right" → 🔗 Executing chain: play next right
#         → Pressed: down
#         → Pressed: shift+right
#         → Pressed: s

# 3. Stop:
# Press Ctrl+C
# System: VOICE CONTROL STOPPED

Troubleshooting

### "API key required"
- Create `.env` file with: `GEMINI_API_KEY=your-key`
- Or pass `--api-key your-key`
- Or set: `export GEMINI_API_KEY=your-key`

"No module named 'pynput'"

bash

pip install pynput

"No module named 'pyaudio'"

bash

# macOS
brew install portaudio
pip install pyaudio

# Linux
sudo apt-get install portaudio19-dev
pip install pyaudio

"No module named 'librosa'"

bash

pip install librosa scipy

### Microphone not working
- Check macOS permissions: System Preferences → Security → Microphone
- Grant Terminal/Python access to microphone

Quick Reference

Start Commands:

bash

# Easiest
./START_VOICE_CONTROL_GEMINI.sh

# Direct
python3 dj_agent/run_voice_control_gemini.py

# List commands
python3 dj_agent/run_voice_control_gemini.py --commands

Stop:
- Press `Ctrl+C`

Test System:

bash

python3 dj_agent/voice_control/test_system.py

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/apps/web/cc-studio/docs/dj_agent/voice_control/QUICKSTART.md

Detected Structure

Method · Evaluation · References · Code Anchors