Grand Diomande Research · Full HTML Reader

Voice Ordering System - Technical Documentation

The BrewsWithBeats voice ordering system uses a hybrid architecture combining: - **iOS 26 SpeechAnalyzer** for on-device transcription - **Semantic embeddings** for accurate menu matching - **Confidence-based clarification** for reliable order capture - **YAML constraints** for menu rule validation

Business Systems research note experiment writeup candidate score 32 .md

Full Public Reader

Voice Ordering System - Technical Documentation

Overview

The BrewsWithBeats voice ordering system uses a hybrid architecture combining:
- iOS 26 SpeechAnalyzer for on-device transcription
- Semantic embeddings for accurate menu matching
- Confidence-based clarification for reliable order capture
- YAML constraints for menu rule validation

---

Architecture

┌──────────────────────────────────────────────────────┐
│              iOS 26 SpeechAnalyzer                   │
│         (On-device speech-to-text)                   │
└─────────────────────┬────────────────────────────────┘
                      │ transcript
                      ▼
┌──────────────────────────────────────────────────────┐
│            EnhancedVoiceNLUEngine                    │
│  ┌─────────────────────────────────────────────┐     │
│  │ 1. MenuEmbeddingService (semantic search)   │     │
│  │ 2. SlotClassifier (extract size, milk, etc) │     │
│  │ 3. ClarificationPolicy (confidence checks)  │     │
│  │ 4. YAMLConstraintEngine (validate rules)    │     │
│  └─────────────────────────────────────────────┘     │
└─────────────────────┬────────────────────────────────┘
                      │ NLUResult
                      ▼
┌──────────────────────────────────────────────────────┐
│            VoiceDialogueManager                      │
│    (State machine: listen → process → confirm)       │
└──────────────────────────────────────────────────────┘

---

Components

1. MenuEmbeddingService

Location: `BWBCore/Sources/BWBCore/Voice/Embeddings/MenuEmbeddingService.swift`

Purpose: Find menu items using semantic similarity instead of exact text matching.

How it works:
1. Converts menu items to vector embeddings (300-dimensional)
2. Converts user's speech to a vector
3. Finds closest matches using cosine similarity
4. Combines with fuzzy matching for robustness

Key Methods:

swift

// Initialize and load embeddings
await embeddingService.loadAndEmbed(menuItems: items)

// Search for menu items
let results = await embeddingService.hybridSearch(query: "large iced vanilla latte", topK: 5)
// Returns: [(MenuSearchResult with item, score, matchType)]

Configuration:
- Hybrid weight: 60
- Uses Apple's NLEmbedding (built-in, no external dependencies)

---

2. SlotClassifier

Location: `BWBCore/Sources/BWBCore/Voice/Slots/SlotClassifier.swift`

Purpose: Extract order details (size, temperature, milk, etc.) from speech.

Slots Extracted:

Slot	Values	Example Phrases
size	small, medium, large, xl	"large", "grande", "big"
temperature	hot, iced, blended	"iced", "cold", "frozen"
milk	whole, skim, oat, almond, soy, coconut	"with oat milk", "almond"
caffeine	regular, decaf, half-caf	"decaf", "half caf"
shots	1-6	"extra shot", "triple shot"
syrup	vanilla, caramel, hazelnut, mocha	"vanilla", "add caramel"
quantity	1-10	"two", "3 of them"

Key Methods:

swift

let result = slotClassifier.extractSlots(from: "large iced oat milk latte", context: menuMatches)
// Returns: SlotExtractionResult with predictions and confidence scores

Confidence Scoring:
- Each slot has a confidence score (0.0 - 1.0)
- Scores are calibrated using temperature scaling (T=1.5)
- Low confidence triggers clarification

---

3. ClarificationPolicy

Location: `BWBCore/Sources/BWBCore/Voice/Dialogue/ClarificationPolicy.swift`

Purpose: Decide when to ask the customer for clarification.

Rules:

Condition	Action
Confidence < 0.6	Ask for clarification
Gap to 2nd choice < 0.1	Ask for clarification
Milk/caffeine confidence < 0.75	Always verify (health/allergy)

Example:

swift

// User says: "I want a latte with... um... milk"
// System detects low confidence on milk type
// Response: "What kind of milk would you like? Whole, oat, almond, or soy?"

Priority Order:
1. High importance (milk, caffeine) - asked first
2. Medium importance (menu item, size)
3. Low importance (syrups, extras)

---

4. YAMLConstraintEngine

Location: `BWBCore/Sources/BWBCore/Voice/Constraints/YAMLConstraintEngine.swift`

Purpose: Validate orders against menu rules defined in YAML.

Configuration File: `BWBCore/Resources/cafe_constraints.yaml`

Constraint Types:

yaml

# Size constraints - what sizes are available
size_constraints:
  cappuccino:
    allowed: [small, medium]
    default: small
  latte:
    allowed: [small, medium, large, xl]
    default: medium

# Temperature constraints
temperature_constraints:
  cold_brew:
    allowed: [iced]
    default: iced

# Combination rules - invalid combinations
combination_rules:
  - name: no_iced_cappuccino
    item: cappuccino
    disallow:
      temperature: iced
    message: "Cappuccino is traditionally served hot"

# Seasonal items
seasonal_constraints:
  pumpkin_spice_latte:
    available: true
    start_month: 9
    end_month: 11

# Price modifiers
price_modifiers:
  alternative_milk:
    oat: 0.80
    almond: 0.70
  extra_espresso: 0.75

Validation Process:

swift

var order = VoiceParsedOrder(itemName: "cappuccino", size: .large)
let (processedOrder, result) = constraintEngine.processOrder(&order)

// If large cappuccino not allowed:
// - order.size automatically changed to .medium
// - result.autoFixes["size"] = "medium"
// - result.violations contains warning message

---

5. EnhancedVoiceNLUEngine

Location: `BWBCore/Sources/BWBCore/Voice/EnhancedVoiceNLUEngine.swift`

Purpose: Main entry point that combines all components.

Usage:

swift

let engine = EnhancedVoiceNLUEngine()

// Initialize (loads embeddings - do this once at startup)
await engine.initialize()

// Process speech
let result = await engine.process(transcript: "large iced vanilla latte with oat milk")

// Result contains:
// - intent: .order
// - parsedOrders: [VoiceParsedOrder]
// - clarificationsNeeded: [ClarificationRequest]
// - overallConfidence: 0.85

---

Data Flow Example

Input: "Can I get a large iced vanilla latte with oat milk?"

Step 1: Normalize
├── Remove fillers: "can I get a large iced vanilla latte with oat milk"
├── Lowercase: "can i get a large iced vanilla latte with oat milk"
└── Output: "large iced vanilla latte oat milk"

Step 2: Intent Classification
├── Pattern match: "can i get" → order intent
└── Output: intent = .order, confidence = 0.95

Step 3: Menu Search (Semantic)
├── Embed query: [0.12, -0.34, 0.56, ...]
├── Search index: cosine similarity
└── Output: "Vanilla Latte" (score: 0.92)

Step 4: Slot Extraction
├── Size: "large" → large (confidence: 0.98)
├── Temperature: "iced" → iced (confidence: 0.97)
├── Milk: "oat milk" → oat (confidence: 0.95)
└── Syrup: "vanilla" → vanilla (confidence: 0.90)

Step 5: Clarification Check
├── All confidences > 0.6 ✓
├── No close alternatives ✓
└── Output: No clarification needed

Step 6: Constraint Validation
├── Vanilla Latte allows large ✓
├── Vanilla Latte allows iced ✓
├── Oat milk available ✓
└── Output: Valid order

Step 7: Build Result
└── VoiceParsedOrder(
      itemName: "Vanilla Latte",
      size: .large,
      temperature: .iced,
      milk: .oat,
      syrups: ["vanilla"]
    )

---

File Structure

BWBCore/Sources/BWBCore/Voice/
├── Embeddings/
│   ├── MenuDocument.swift         # Document types for search
│   ├── TextEmbedder.swift         # NLEmbedding wrapper
│   ├── VectorIndex.swift          # In-memory vector search
│   └── MenuEmbeddingService.swift # Main search service
├── Slots/
│   ├── SlotDefinitions.swift      # Slot types and patterns
│   ├── SlotPrediction.swift       # Prediction types
│   └── SlotClassifier.swift       # Extraction logic
├── Dialogue/
│   ├── ClarificationPolicy.swift  # When to clarify
│   └── ConfirmationGenerator.swift # Generate confirmations
├── Constraints/
│   ├── ConstraintTypes.swift      # YAML data structures
│   ├── ConstraintParser.swift     # YAML loading
│   └── YAMLConstraintEngine.swift # Validation engine
├── EnhancedVoiceNLUEngine.swift   # Main NLU engine
├── VoiceNLUEngine.swift           # Legacy engine (still works)
├── VoiceDialogueManager.swift     # State machine
├── SpeechAnalyzerService.swift    # iOS 26 speech
└── VoiceTypes.swift               # Shared types

BWBCore/Resources/
└── cafe_constraints.yaml          # Menu constraints

---

Key Types

NLUResult

swift

struct NLUResult {
    let intent: VoiceIntent           // .order, .modify, .cancel, etc.
    let parsedOrders: [VoiceParsedOrder]
    let clarificationsNeeded: [ClarificationRequest]
    let overallConfidence: Double
    let rawTranscript: String
}

VoiceParsedOrder

swift

struct VoiceParsedOrder {
    var itemId: String?
    var itemName: String
    var size: DrinkSize?
    var temperature: DrinkTemperature?
    var milk: MilkType?
    var caffeine: CaffeineOption?
    var shots: Int?
    var syrups: [String]
    var quantity: Int
    var confidence: Double
}

ClarificationRequest

swift

struct ClarificationRequest {
    let slotName: String
    let question: String
    let options: [String]
    let importance: ClarificationImportance  // .high, .medium, .low
}

---

Configuration

### Confidence Thresholds
| Parameter | Value | Description |
|-----------|-------|-------------|
| Base threshold | 0.6 | Below this, always clarify |
| Gap threshold | 0.1 | If 2nd choice is within 0.1, clarify |
| High-importance threshold | 0.75 | For milk/caffeine slots |

### Search Parameters
| Parameter | Value | Description |
|-----------|-------|-------------|
| Top-K results | 5 | Number of candidates to consider |
| Semantic weight | 0.6 | Weight for embedding similarity |
| Fuzzy weight | 0.4 | Weight for text matching |
| Minimum score | 0.3 | Below this, no match |

---

Usage in App

POS Kiosk Integration

swift

// In KioskOrderingView.swift
@StateObject private var nluEngine = EnhancedVoiceNLUEngine()

func handleTranscript(_ transcript: String) async {
    let result = await nluEngine.process(transcript: transcript)

    if !result.clarificationsNeeded.isEmpty {
        // Ask for clarification
        showClarification(result.clarificationsNeeded.first!)
    } else if let order = result.parsedOrders.first {
        // Add to cart
        addToCart(order)
    }
}

Switching from Legacy Engine

swift

// Old way (still works)
let legacyEngine = VoiceNLUEngine.shared
let result = await legacyEngine.process(transcript: text)

// New way (recommended)
let enhancedEngine = EnhancedVoiceNLUEngine()
await enhancedEngine.initialize()
let result = await enhancedEngine.process(transcript: text)

---

Performance

Operation	Time	Notes
Initialize embeddings	~500ms	Do once at startup
Process transcript	<200ms	Full NLU pipeline
Semantic search	<50ms	SIMD optimized
Constraint validation	<10ms	In-memory

---

Troubleshooting

### Low Match Accuracy
- Check if menu items have aliases in `MenuDocument.defaultMenuItems`
- Verify embeddings loaded: `embeddingService.isLoaded`
- Try lowering minimum score threshold

### Slow Performance
- Ensure embeddings are cached (not recomputed each time)
- Check VectorIndex size isn't too large (>10K items)

### Incorrect Constraints
- Verify `cafe_constraints.yaml` is in bundle
- Check item ID normalization (lowercase, no spaces)
- Review constraint priority order

---

Learning System

The system learns from barista corrections to improve accuracy over time.

Location: `BWBCore/Sources/BWBCore/Voice/Learning/`

How It Works

Customer: "large oatmeal latte"
              ↓
System: "large ??? latte" (milk unknown)
              ↓
Barista corrects → "oat milk"
              ↓
FeedbackCollector logs: "oatmeal" → "oat"
              ↓
After 3 corrections → Suggestion generated
              ↓
Admin approves → PatternLearner applies
              ↓
Next time: "oatmeal" recognized as "oat"

Components

File	Purpose
`FeedbackCollector.swift`	Records barista corrections
`PatternLearner.swift`	Applies learned aliases
`LearningTypes.swift`	Dashboard and config types

Usage

swift

// Record a correction
FeedbackCollector.shared.recordCorrection(
    originalTranscript: "large oatmeal latte",
    originalOrder: wrongOrder,
    correctedOrder: fixedOrder
)

// Review suggestions
let pending = FeedbackCollector.shared.getPendingSuggestions()

// Approve and apply
FeedbackCollector.shared.approveSuggestion(suggestion.id)
PatternLearner.shared.applyApprovedSuggestions()

// Export learned aliases
let yaml = PatternLearner.shared.exportAsYAML()

Data Storage

Documents/voice_feedback/
├── corrections.json      # All corrections
├── patterns.json         # Detected patterns
├── suggestions.json      # Pending suggestions
└── learned_aliases.json  # Applied aliases

---

Future Improvements

1. Swift-embeddings package - Better accuracy than NLEmbedding
2. Embedding persistence - SQLite/CoreData for faster startup
3. USearch integration - For larger menus (10K+ items)

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

BWB/docs/VOICE_SYSTEM_TECHNICAL_DOCS.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture