Grand Diomande Research · Full HTML Reader

Voice Ordering System - Technical Documentation

The BrewsWithBeats voice ordering system uses a hybrid architecture combining: - **iOS 26 SpeechAnalyzer** for on-device transcription - **Semantic embeddings** for accurate menu matching - **Confidence-based clarification** for reliable order capture - **YAML constraints** for menu rule validation

Business Systems research note experiment writeup candidate score 32 .md

Full Public Reader

Voice Ordering System - Technical Documentation

Overview

The BrewsWithBeats voice ordering system uses a hybrid architecture combining:
- iOS 26 SpeechAnalyzer for on-device transcription
- Semantic embeddings for accurate menu matching
- Confidence-based clarification for reliable order capture
- YAML constraints for menu rule validation

---

Architecture

┌──────────────────────────────────────────────────────┐
│              iOS 26 SpeechAnalyzer                   │
│         (On-device speech-to-text)                   │
└─────────────────────┬────────────────────────────────┘
                      │ transcript
                      ▼
┌──────────────────────────────────────────────────────┐
│            EnhancedVoiceNLUEngine                    │
│  ┌─────────────────────────────────────────────┐     │
│  │ 1. MenuEmbeddingService (semantic search)   │     │
│  │ 2. SlotClassifier (extract size, milk, etc) │     │
│  │ 3. ClarificationPolicy (confidence checks)  │     │
│  │ 4. YAMLConstraintEngine (validate rules)    │     │
│  └─────────────────────────────────────────────┘     │
└─────────────────────┬────────────────────────────────┘
                      │ NLUResult
                      ▼
┌──────────────────────────────────────────────────────┐
│            VoiceDialogueManager                      │
│    (State machine: listen → process → confirm)       │
└──────────────────────────────────────────────────────┘

---

Components

1. MenuEmbeddingService

Location: `BWBCore/Sources/BWBCore/Voice/Embeddings/MenuEmbeddingService.swift`

Purpose: Find menu items using semantic similarity instead of exact text matching.

How it works:
1. Converts menu items to vector embeddings (300-dimensional)
2. Converts user's speech to a vector
3. Finds closest matches using cosine similarity
4. Combines with fuzzy matching for robustness

Key Methods:

swift
// Initialize and load embeddings
await embeddingService.loadAndEmbed(menuItems: items)

// Search for menu items
let results = await embeddingService.hybridSearch(query: "large iced vanilla latte", topK: 5)
// Returns: [(MenuSearchResult with item, score, matchType)]

Configuration:
- Hybrid weight: 60
- Uses Apple's NLEmbedding (built-in, no external dependencies)

---

2. SlotClassifier

Location: `BWBCore/Sources/BWBCore/Voice/Slots/SlotClassifier.swift`

Purpose: Extract order details (size, temperature, milk, etc.) from speech.

Slots Extracted:

SlotValuesExample Phrases
sizesmall, medium, large, xl"large", "grande", "big"
temperaturehot, iced, blended"iced", "cold", "frozen"
milkwhole, skim, oat, almond, soy, coconut"with oat milk", "almond"
caffeineregular, decaf, half-caf"decaf", "half caf"
shots1-6"extra shot", "triple shot"
syrupvanilla, caramel, hazelnut, mocha"vanilla", "add caramel"
quantity1-10"two", "3 of them"

Key Methods:

swift
let result = slotClassifier.extractSlots(from: "large iced oat milk latte", context: menuMatches)
// Returns: SlotExtractionResult with predictions and confidence scores

Confidence Scoring:
- Each slot has a confidence score (0.0 - 1.0)
- Scores are calibrated using temperature scaling (T=1.5)
- Low confidence triggers clarification

---

3. ClarificationPolicy

Location: `BWBCore/Sources/BWBCore/Voice/Dialogue/ClarificationPolicy.swift`

Purpose: Decide when to ask the customer for clarification.

Rules:

ConditionAction
Confidence < 0.6Ask for clarification
Gap to 2nd choice < 0.1Ask for clarification
Milk/caffeine confidence < 0.75Always verify (health/allergy)

Example:

swift
// User says: "I want a latte with... um... milk"
// System detects low confidence on milk type
// Response: "What kind of milk would you like? Whole, oat, almond, or soy?"

Priority Order:
1. High importance (milk, caffeine) - asked first
2. Medium importance (menu item, size)
3. Low importance (syrups, extras)

---

4. YAMLConstraintEngine

Location: `BWBCore/Sources/BWBCore/Voice/Constraints/YAMLConstraintEngine.swift`

Purpose: Validate orders against menu rules defined in YAML.

Configuration File: `BWBCore/Resources/cafe_constraints.yaml`

Constraint Types:

yaml
# Size constraints - what sizes are available
size_constraints:
  cappuccino:
    allowed: [small, medium]
    default: small
  latte:
    allowed: [small, medium, large, xl]
    default: medium

# Temperature constraints
temperature_constraints:
  cold_brew:
    allowed: [iced]
    default: iced

# Combination rules - invalid combinations
combination_rules:
  - name: no_iced_cappuccino
    item: cappuccino
    disallow:
      temperature: iced
    message: "Cappuccino is traditionally served hot"

# Seasonal items
seasonal_constraints:
  pumpkin_spice_latte:
    available: true
    start_month: 9
    end_month: 11

# Price modifiers
price_modifiers:
  alternative_milk:
    oat: 0.80
    almond: 0.70
  extra_espresso: 0.75

Validation Process:

swift
var order = VoiceParsedOrder(itemName: "cappuccino", size: .large)
let (processedOrder, result) = constraintEngine.processOrder(&order)

// If large cappuccino not allowed:
// - order.size automatically changed to .medium
// - result.autoFixes["size"] = "medium"
// - result.violations contains warning message

---

5. EnhancedVoiceNLUEngine

Location: `BWBCore/Sources/BWBCore/Voice/EnhancedVoiceNLUEngine.swift`

Purpose: Main entry point that combines all components.

Usage:

swift
let engine = EnhancedVoiceNLUEngine()

// Initialize (loads embeddings - do this once at startup)
await engine.initialize()

// Process speech
let result = await engine.process(transcript: "large iced vanilla latte with oat milk")

// Result contains:
// - intent: .order
// - parsedOrders: [VoiceParsedOrder]
// - clarificationsNeeded: [ClarificationRequest]
// - overallConfidence: 0.85

---

Data Flow Example

Input: "Can I get a large iced vanilla latte with oat milk?"

Step 1: Normalize
├── Remove fillers: "can I get a large iced vanilla latte with oat milk"
├── Lowercase: "can i get a large iced vanilla latte with oat milk"
└── Output: "large iced vanilla latte oat milk"

Step 2: Intent Classification
├── Pattern match: "can i get" → order intent
└── Output: intent = .order, confidence = 0.95

Step 3: Menu Search (Semantic)
├── Embed query: [0.12, -0.34, 0.56, ...]
├── Search index: cosine similarity
└── Output: "Vanilla Latte" (score: 0.92)

Step 4: Slot Extraction
├── Size: "large" → large (confidence: 0.98)
├── Temperature: "iced" → iced (confidence: 0.97)
├── Milk: "oat milk" → oat (confidence: 0.95)
└── Syrup: "vanilla" → vanilla (confidence: 0.90)

Step 5: Clarification Check
├── All confidences > 0.6 ✓
├── No close alternatives ✓
└── Output: No clarification needed

Step 6: Constraint Validation
├── Vanilla Latte allows large ✓
├── Vanilla Latte allows iced ✓
├── Oat milk available ✓
└── Output: Valid order

Step 7: Build Result
└── VoiceParsedOrder(
      itemName: "Vanilla Latte",
      size: .large,
      temperature: .iced,
      milk: .oat,
      syrups: ["vanilla"]
    )

---

File Structure

BWBCore/Sources/BWBCore/Voice/
├── Embeddings/
│   ├── MenuDocument.swift         # Document types for search
│   ├── TextEmbedder.swift         # NLEmbedding wrapper
│   ├── VectorIndex.swift          # In-memory vector search
│   └── MenuEmbeddingService.swift # Main search service
├── Slots/
│   ├── SlotDefinitions.swift      # Slot types and patterns
│   ├── SlotPrediction.swift       # Prediction types
│   └── SlotClassifier.swift       # Extraction logic
├── Dialogue/
│   ├── ClarificationPolicy.swift  # When to clarify
│   └── ConfirmationGenerator.swift # Generate confirmations
├── Constraints/
│   ├── ConstraintTypes.swift      # YAML data structures
│   ├── ConstraintParser.swift     # YAML loading
│   └── YAMLConstraintEngine.swift # Validation engine
├── EnhancedVoiceNLUEngine.swift   # Main NLU engine
├── VoiceNLUEngine.swift           # Legacy engine (still works)
├── VoiceDialogueManager.swift     # State machine
├── SpeechAnalyzerService.swift    # iOS 26 speech
└── VoiceTypes.swift               # Shared types

BWBCore/Resources/
└── cafe_constraints.yaml          # Menu constraints

---

Key Types

NLUResult

swift
struct NLUResult {
    let intent: VoiceIntent           // .order, .modify, .cancel, etc.
    let parsedOrders: [VoiceParsedOrder]
    let clarificationsNeeded: [ClarificationRequest]
    let overallConfidence: Double
    let rawTranscript: String
}

VoiceParsedOrder

swift
struct VoiceParsedOrder {
    var itemId: String?
    var itemName: String
    var size: DrinkSize?
    var temperature: DrinkTemperature?
    var milk: MilkType?
    var caffeine: CaffeineOption?
    var shots: Int?
    var syrups: [String]
    var quantity: Int
    var confidence: Double
}

ClarificationRequest

swift
struct ClarificationRequest {
    let slotName: String
    let question: String
    let options: [String]
    let importance: ClarificationImportance  // .high, .medium, .low
}

---

Configuration

### Confidence Thresholds
| Parameter | Value | Description |
|-----------|-------|-------------|
| Base threshold | 0.6 | Below this, always clarify |
| Gap threshold | 0.1 | If 2nd choice is within 0.1, clarify |
| High-importance threshold | 0.75 | For milk/caffeine slots |

### Search Parameters
| Parameter | Value | Description |
|-----------|-------|-------------|
| Top-K results | 5 | Number of candidates to consider |
| Semantic weight | 0.6 | Weight for embedding similarity |
| Fuzzy weight | 0.4 | Weight for text matching |
| Minimum score | 0.3 | Below this, no match |

---

Usage in App

POS Kiosk Integration

swift
// In KioskOrderingView.swift
@StateObject private var nluEngine = EnhancedVoiceNLUEngine()

func handleTranscript(_ transcript: String) async {
    let result = await nluEngine.process(transcript: transcript)

    if !result.clarificationsNeeded.isEmpty {
        // Ask for clarification
        showClarification(result.clarificationsNeeded.first!)
    } else if let order = result.parsedOrders.first {
        // Add to cart
        addToCart(order)
    }
}

Switching from Legacy Engine

swift
// Old way (still works)
let legacyEngine = VoiceNLUEngine.shared
let result = await legacyEngine.process(transcript: text)

// New way (recommended)
let enhancedEngine = EnhancedVoiceNLUEngine()
await enhancedEngine.initialize()
let result = await enhancedEngine.process(transcript: text)

---

Performance

OperationTimeNotes
Initialize embeddings~500msDo once at startup
Process transcript<200msFull NLU pipeline
Semantic search<50msSIMD optimized
Constraint validation<10msIn-memory

---

Troubleshooting

### Low Match Accuracy
- Check if menu items have aliases in `MenuDocument.defaultMenuItems`
- Verify embeddings loaded: `embeddingService.isLoaded`
- Try lowering minimum score threshold

### Slow Performance
- Ensure embeddings are cached (not recomputed each time)
- Check VectorIndex size isn't too large (>10K items)

### Incorrect Constraints
- Verify `cafe_constraints.yaml` is in bundle
- Check item ID normalization (lowercase, no spaces)
- Review constraint priority order

---

Learning System

The system learns from barista corrections to improve accuracy over time.

Location: `BWBCore/Sources/BWBCore/Voice/Learning/`

How It Works

Customer: "large oatmeal latte"
              ↓
System: "large ??? latte" (milk unknown)
              ↓
Barista corrects → "oat milk"
              ↓
FeedbackCollector logs: "oatmeal" → "oat"
              ↓
After 3 corrections → Suggestion generated
              ↓
Admin approves → PatternLearner applies
              ↓
Next time: "oatmeal" recognized as "oat"

Components

FilePurpose
`FeedbackCollector.swift`Records barista corrections
`PatternLearner.swift`Applies learned aliases
`LearningTypes.swift`Dashboard and config types

Usage

swift
// Record a correction
FeedbackCollector.shared.recordCorrection(
    originalTranscript: "large oatmeal latte",
    originalOrder: wrongOrder,
    correctedOrder: fixedOrder
)

// Review suggestions
let pending = FeedbackCollector.shared.getPendingSuggestions()

// Approve and apply
FeedbackCollector.shared.approveSuggestion(suggestion.id)
PatternLearner.shared.applyApprovedSuggestions()

// Export learned aliases
let yaml = PatternLearner.shared.exportAsYAML()

Data Storage

Documents/voice_feedback/
├── corrections.json      # All corrections
├── patterns.json         # Detected patterns
├── suggestions.json      # Pending suggestions
└── learned_aliases.json  # Applied aliases

---

Future Improvements

1. Swift-embeddings package - Better accuracy than NLEmbedding
2. Embedding persistence - SQLite/CoreData for faster startup
3. USearch integration - For larger menus (10K+ items)

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

BWB/docs/VOICE_SYSTEM_TECHNICAL_DOCS.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture