Grand Diomande Research · Full HTML Reader

Tier 3: Context-Aware Embeddings - Semantic Command Disambiguation Guide

**Context-Aware Embeddings** enables the voice control system to understand ambiguous commands by considering the current DJ system state. When you say "play" or "sync" without specifying a deck, the system intelligently infers which deck you mean based on what's currently happening.

Agents That Account for Themselves research note experiment writeup candidate score 40 .md

Full Public Reader

Tier 3: Context-Aware Embeddings - Semantic Command Disambiguation Guide

Overview

Context-Aware Embeddings enables the voice control system to understand ambiguous commands by considering the current DJ system state. When you say "play" or "sync" without specifying a deck, the system intelligently infers which deck you mean based on what's currently happening.

Benefits:
- ✅ Natural, conversational commands ("play" instead of "play left")
- ✅ Context-aware disambiguation (knows which deck you mean)
- ✅ Intelligent action suggestions (predicts next likely commands)
- ✅ Fast heuristic matching (<5ms overhead)
- ✅ No ML training required (rule-based)

---

Quick Start

Default Behavior (Enabled)

bash
python run_rekordbox_voice_gemini_enhanced.py

Startup Output:

⚙️  Tier 3 Enhancements:
   ↩️  State tracking & undo: True
      (history size: 20)
   💻 Whisper fallback: True
      (model: base.en, offline capable)
   🎯 Context embeddings: True
      (semantic command disambiguation)

✓ Connecting to Gemini Live API...

What Happens:
1. System tracks command history (last deck, last action)
2. When you give ambiguous command, system analyzes context
3. Infers which deck you mean based on priority rules
4. Resolves command automatically with high confidence

---

How It Works

Architecture

┌─────────────────────────────────────────┐
│   Voice Command: "sync"                 │
│   (no deck specified - AMBIGUOUS)       │
└──────────────┬──────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────┐
│      Context Encoder                     │
│  Last action: play                       │
│  Last deck: left                         │
│  Left deck: playing                      │
│  Right deck: stopped                     │
└──────────────┬───────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────┐
│      Semantic Matcher                    │
│  Priority: last deck > cued > playing    │
│  Inference: "sync" → "sync left"         │
│  Confidence: 90%                         │
└──────────────┬───────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────┐
│   Resolved Command: "sync left"          │
│   Reasoning: last action was on left     │
└──────────────────────────────────────────┘

Disambiguation Priority

When resolving ambiguous commands, the system uses this priority order:

1. Last Deck (90
- Most recent action was on this deck
- Example: After "play left", "sync" → "sync left"

2. Cued Deck (75
- Deck is cued and ready to play
- Example: Right cued → "play" → "play right"

3. Playing Deck (75
- Deck is currently playing
- Example: Left playing → "loop" → "loop left"

4. Loop Active (70
- Deck has an active loop
- Example: Right looping → "exit loop" → "exit loop right"

5. Crossfader Position (65
- Crossfader favors one deck
- Example: Crossfader on left → "sync" → "sync left"

6. Default to Left (40
- No context available
- Fallback behavior

---

Usage Examples

Example 1: Sequential Commands

Scenario: You're working with the left deck

You: "play left"
→ ✓ play left

You: "sync"  (ambiguous - no deck specified)
→ 🎯 Semantic match: "sync" → "sync left" (90%)
   Reasoning: last action was on left
→ ✓ sync left

You: "loop 4 beats"  (ambiguous)
→ 🎯 Semantic match: "loop 4 beats" → "loop 4 beats left" (90%)
   Reasoning: last action was on left
→ ✓ loop 4 beats left

Impact: Natural flow - you don't need to repeat "left" every time

Example 2: Deck Switching

Scenario: You cue the right deck while left is playing

You: "play left"
→ ✓ play left

You: "cue right"
→ ✓ cue right

You: "play"  (ambiguous)
→ 🎯 Semantic match: "play" → "play right" (90%)
   Reasoning: last action was on right
→ ✓ play right

Impact: System remembers your last action, even when switching decks

Example 3: Track References

Scenario: You want to play a track

You: "load track ABC left"
→ ✓ load track ABC left

You: "play that"  (ambiguous reference)
→ 🎯 Semantic match: "play that" → "play left" (80%)
   Reasoning: Reference resolves to left deck based on context
→ ✓ play left

Impact: Conversational references like "that" work naturally

Example 4: No Context

Scenario: First command after startup

You: "sync"  (ambiguous, no context)
→ 🎯 Semantic match: "sync" → "sync left" (40%)
   Reasoning: No context available, defaulting to left deck
→ ✓ sync left

Impact: Even without context, system provides reasonable default

---

Ambiguous Command Patterns

The system detects these ambiguous patterns:

Deck Actions (No Deck Specified)

CommandResolves ToExample
`play``play {inferred_deck}`"play" → "play left"
`stop``stop {inferred_deck}`"stop" → "stop right"
`pause``pause {inferred_deck}`"pause" → "pause left"
`sync``sync {inferred_deck}`"sync" → "sync right"
`loop``loop {inferred_deck}`"loop" → "loop left"
`cue``cue {inferred_deck}`"cue" → "cue right"
`halve loop``halve loop {inferred_deck}`"halve loop" → "halve loop left"
`double loop``double loop {inferred_deck}`"double loop" → "double loop right"
`exit loop``exit loop {inferred_deck}`"exit loop" → "exit loop left"

Track References

CommandResolves ToExample
`play that``play {inferred_deck}`"play that" → "play left"
`load this``load {inferred_deck}`"load this" → "load right"
`eject it``eject {inferred_deck}`"eject it" → "eject left"

Relative Commands

CommandResolves ToExample
`next track``next track {inferred_deck}`"next track" → "next track left"
`previous track``previous track {inferred_deck}`"previous track" → "previous track right"

---

Context Encoding

The system encodes DJ state as natural language:

Example Context Encoding

State:
- Left deck: playing, track loaded
- Right deck: cued, track loaded
- Crossfader: on left
- Last action: "cue" on right

Encoded Description:

"left deck is playing. right deck is cued and ready. crossfader is on left. last action was 'cue' on right."

This description is used for semantic matching and debugging.

---

Command Suggestions

The system can suggest likely next actions based on context:

Example Suggestions

Scenario 1: Left deck cued

Context: Left deck is cued and ready
Suggested actions:
  - play left deck
  - sync left deck

Scenario 2: Left deck playing, right deck cued

Context: Both decks ready, left playing
Suggested actions:
  - play right deck
  - crossfade between decks
  - sync right deck

Scenario 3: Left deck has active loop

Context: Left deck looping (4 beats)
Suggested actions:
  - halve left loop
  - double left loop
  - exit left loop

---

Performance

Latency

OperationTimeNotes
Context encoding<1msVery fast
Semantic matching<5msHeuristic-based
Command resolution<2msSimple pattern matching
Total overhead<10msNegligible impact

Accuracy

ConfidenceAccuracyDecision
90
75
65
40

Threshold: System only applies resolution if confidence ≥ 70

---

Configuration

Enable/Disable

Enable: (default)

bash
python run_rekordbox_voice_gemini_enhanced.py

Disable:

bash
python run_rekordbox_voice_gemini_enhanced.py --no-embeddings

When Disabled:
- Ambiguous commands are NOT resolved
- User must specify deck explicitly
- Falls back to Tier 1 intelligent defaults

---

Integration with Other Features

Works With Tier 1 Intelligent Defaults

Context embeddings runs BEFORE intelligent defaults:

1. Tier 2 Contextual Disambiguation - Resolve pronouns ("that" → "left")
2. Tier 3 Context Embeddings - Resolve ambiguous actions ("sync" → "sync left")
3. Tier 1 Intelligent Defaults - Apply final defaults if still ambiguous

This layered approach ensures maximum accuracy.

Works With Tier 3 State Tracking

When state tracking is enabled, context is richer:

  • Command history is more accurate
  • Undo/redo commands preserve context
  • Better predictions for next actions

Works With Tier 2 Macros

Macros benefit from context:

yaml
# Macro: transition
transition:
  description: "Transition from current deck to other deck"
  commands:
    - "sync {other_deck}"  # Context resolves {other_deck}
    - "play {other_deck}"
    - "crossfade to {other_deck}"

---

Troubleshooting

Issue: Wrong Deck Inferred

Symptoms:

You: "sync"
→ 🎯 Semantic match: "sync" → "sync right" (75%)
→ (Expected: sync left)

Causes:
1. Recent action was on right deck
2. Right deck has stronger context signal (cued, looping, etc.)

Solutions:
1. Be more explicit: "sync left"
2. Check last action (might have been on different deck)
3. Verify crossfader position (might favor other deck)

Issue: Low Confidence Warning

Symptoms:

You: "play"
→ 🎯 Semantic match: "play" → "play left" (40%)
   Reasoning: No context available, defaulting to left deck

Cause: No recent context (first command after startup)

Solution:
1. This is expected behavior
2. System falls back to left deck default
3. Subsequent commands will have higher confidence

Issue: Commands Not Being Disambiguated

Symptoms:

You: "sync"
→ ✓ sync  (not resolved)

Causes:
1. Context embeddings disabled (`--no-embeddings`)
2. Command is not ambiguous (already specifies deck)
3. Confidence below 70

Solutions:
1. Check startup output: `🎯 Context embeddings: True`
2. Verify command pattern is in ambiguous list
3. Provide more context (execute deck-specific commands first)

---

API Reference

CLI Arguments

bash
--no-embeddings     # Disable context-aware embeddings

Python API

python
from dj_agent.voice_control.core.gemini_listener_enhanced import EnhancedGeminiVoiceListener

listener = EnhancedGeminiVoiceListener(
    enable_context_embeddings=True,  # Enable feature (default)
)

# Check status
print(listener.enable_context_embeddings)  # True/False

# Get stats
print(f"Commands disambiguated: {listener.commands_disambiguated}")

Programmatic Access

python
from dj_agent.voice_control.embeddings import (
    SemanticCommandMatcher,
    SystemContext,
)

# Create matcher
matcher = SemanticCommandMatcher()

# Build context
context = SystemContext(
    left_playing=True,
    last_action="play",
    last_deck="left",
)

# Match ambiguous command
result = matcher.match("sync", context)
print(f"Resolved: {result.command}")  # "sync left"
print(f"Confidence: {result.confidence}")  # 0.9
print(f"Reasoning: {result.reasoning}")  # "last action was on left"

---

Best Practices

1. Build Context Gradually

Good:

You: "load track ABC left"
You: "play"  # Infers left
You: "sync"  # Infers left

Why: Each command builds context for the next

2. Be Explicit When Switching Decks

Good:

You: "play left"
You: "cue right"  # Explicit deck switch
You: "play"  # Infers right (recent action)

Why: Explicit deck specification creates clear context

3. Check Reasoning When Learning

Good:

You: "sync"
→ 🎯 Semantic match: "sync" → "sync left" (90%)
   Reasoning: last action was on left  ← Read this!

Why: Understanding reasoning helps you predict behavior

4. Use With State Tracking

Good:

bash
python run_...enhanced.py
# Both embeddings and state tracking enabled

Why: State tracking enriches context for better inference

---

Examples by Use Case

Use Case 1: Quick Deck Operations

Goal: Rapid-fire commands without repeating deck name

You: "load track XYZ left"
You: "play"  → "play left"
You: "loop 4 beats"  → "loop 4 beats left"
You: "sync"  → "sync left"
You: "halve loop"  → "halve loop left"

Benefit: 5 commands, only 1 explicit deck specification

Use Case 2: Transitioning Between Decks

Goal: Smooth transition with minimal verbosity

You: "play left"
You: "cue right"
You: "sync"  → "sync right" (last action)
You: "play"  → "play right"
You: "crossfade to right"

Benefit: Natural flow, system follows your intent

Use Case 3: Loop Manipulation

Goal: Adjust loop on active deck

You: "loop 8 beats left"
You: "halve loop"  → "halve loop left" (has active loop)
You: "halve loop"  → "halve loop left" (4 beats now)
You: "double loop"  → "double loop left" (8 beats again)
You: "exit loop"  → "exit loop left"

Benefit: Natural loop workflow, no deck repetition

---

Summary

Context-Aware Embeddings = Natural Voice Control

  • ✅ Say "play" instead of "play left" (when context is clear)
  • ✅ System infers deck from recent actions
  • ✅ 6 priority levels for disambiguation (90
  • ✅ <10ms overhead (negligible)
  • ✅ Works with Tier 1, 2, 3 features
  • ✅ No ML training required

Enable: (default)

bash
python run_rekordbox_voice_gemini_enhanced.py

Disable:

bash
python run_rekordbox_voice_gemini_enhanced.py --no-embeddings

Check Status:

⚙️  Tier 3 Enhancements:
   🎯 Context embeddings: True
      (semantic command disambiguation)

---

Say less, do more! 🎯🎧

Generated: 2025-11-22
System: Computational Choreography - Tier 3 Context Embeddings
Version: 3.0 - Feature #11

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

projects/Documentation/02-projects/dj-agent/studio/TIER3_CONTEXT_EMBEDDINGS_GUIDE.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture