Voice Ordering System - Technical Documentation
The BrewsWithBeats voice ordering system uses a hybrid architecture combining: - **iOS 26 SpeechAnalyzer** for on-device transcription - **Semantic embeddings** for accurate menu matching - **Confidence-based clarification** for reliable order capture - **YAML constraints** for menu rule validation
Full Public Reader
Voice Ordering System - Technical Documentation
Overview
The BrewsWithBeats voice ordering system uses a hybrid architecture combining:
- iOS 26 SpeechAnalyzer for on-device transcription
- Semantic embeddings for accurate menu matching
- Confidence-based clarification for reliable order capture
- YAML constraints for menu rule validation
---
Architecture
┌──────────────────────────────────────────────────────┐
│ iOS 26 SpeechAnalyzer │
│ (On-device speech-to-text) │
└─────────────────────┬────────────────────────────────┘
│ transcript
▼
┌──────────────────────────────────────────────────────┐
│ EnhancedVoiceNLUEngine │
│ ┌─────────────────────────────────────────────┐ │
│ │ 1. MenuEmbeddingService (semantic search) │ │
│ │ 2. SlotClassifier (extract size, milk, etc) │ │
│ │ 3. ClarificationPolicy (confidence checks) │ │
│ │ 4. YAMLConstraintEngine (validate rules) │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────────┬────────────────────────────────┘
│ NLUResult
▼
┌──────────────────────────────────────────────────────┐
│ VoiceDialogueManager │
│ (State machine: listen → process → confirm) │
└──────────────────────────────────────────────────────┘---
Components
1. MenuEmbeddingService
Location: `BWBCore/Sources/BWBCore/Voice/Embeddings/MenuEmbeddingService.swift`
Purpose: Find menu items using semantic similarity instead of exact text matching.
How it works:
1. Converts menu items to vector embeddings (300-dimensional)
2. Converts user's speech to a vector
3. Finds closest matches using cosine similarity
4. Combines with fuzzy matching for robustness
Key Methods:
// Initialize and load embeddings
await embeddingService.loadAndEmbed(menuItems: items)
// Search for menu items
let results = await embeddingService.hybridSearch(query: "large iced vanilla latte", topK: 5)
// Returns: [(MenuSearchResult with item, score, matchType)]Configuration:
- Hybrid weight: 60
- Uses Apple's NLEmbedding (built-in, no external dependencies)
---
2. SlotClassifier
Location: `BWBCore/Sources/BWBCore/Voice/Slots/SlotClassifier.swift`
Purpose: Extract order details (size, temperature, milk, etc.) from speech.
Slots Extracted:
| Slot | Values | Example Phrases |
|---|---|---|
| size | small, medium, large, xl | "large", "grande", "big" |
| temperature | hot, iced, blended | "iced", "cold", "frozen" |
| milk | whole, skim, oat, almond, soy, coconut | "with oat milk", "almond" |
| caffeine | regular, decaf, half-caf | "decaf", "half caf" |
| shots | 1-6 | "extra shot", "triple shot" |
| syrup | vanilla, caramel, hazelnut, mocha | "vanilla", "add caramel" |
| quantity | 1-10 | "two", "3 of them" |
Key Methods:
let result = slotClassifier.extractSlots(from: "large iced oat milk latte", context: menuMatches)
// Returns: SlotExtractionResult with predictions and confidence scoresConfidence Scoring:
- Each slot has a confidence score (0.0 - 1.0)
- Scores are calibrated using temperature scaling (T=1.5)
- Low confidence triggers clarification
---
3. ClarificationPolicy
Location: `BWBCore/Sources/BWBCore/Voice/Dialogue/ClarificationPolicy.swift`
Purpose: Decide when to ask the customer for clarification.
Rules:
| Condition | Action |
|---|---|
| Confidence < 0.6 | Ask for clarification |
| Gap to 2nd choice < 0.1 | Ask for clarification |
| Milk/caffeine confidence < 0.75 | Always verify (health/allergy) |
Example:
// User says: "I want a latte with... um... milk"
// System detects low confidence on milk type
// Response: "What kind of milk would you like? Whole, oat, almond, or soy?"Priority Order:
1. High importance (milk, caffeine) - asked first
2. Medium importance (menu item, size)
3. Low importance (syrups, extras)
---
4. YAMLConstraintEngine
Location: `BWBCore/Sources/BWBCore/Voice/Constraints/YAMLConstraintEngine.swift`
Purpose: Validate orders against menu rules defined in YAML.
Configuration File: `BWBCore/Resources/cafe_constraints.yaml`
Constraint Types:
# Size constraints - what sizes are available
size_constraints:
cappuccino:
allowed: [small, medium]
default: small
latte:
allowed: [small, medium, large, xl]
default: medium
# Temperature constraints
temperature_constraints:
cold_brew:
allowed: [iced]
default: iced
# Combination rules - invalid combinations
combination_rules:
- name: no_iced_cappuccino
item: cappuccino
disallow:
temperature: iced
message: "Cappuccino is traditionally served hot"
# Seasonal items
seasonal_constraints:
pumpkin_spice_latte:
available: true
start_month: 9
end_month: 11
# Price modifiers
price_modifiers:
alternative_milk:
oat: 0.80
almond: 0.70
extra_espresso: 0.75Validation Process:
var order = VoiceParsedOrder(itemName: "cappuccino", size: .large)
let (processedOrder, result) = constraintEngine.processOrder(&order)
// If large cappuccino not allowed:
// - order.size automatically changed to .medium
// - result.autoFixes["size"] = "medium"
// - result.violations contains warning message---
5. EnhancedVoiceNLUEngine
Location: `BWBCore/Sources/BWBCore/Voice/EnhancedVoiceNLUEngine.swift`
Purpose: Main entry point that combines all components.
Usage:
let engine = EnhancedVoiceNLUEngine()
// Initialize (loads embeddings - do this once at startup)
await engine.initialize()
// Process speech
let result = await engine.process(transcript: "large iced vanilla latte with oat milk")
// Result contains:
// - intent: .order
// - parsedOrders: [VoiceParsedOrder]
// - clarificationsNeeded: [ClarificationRequest]
// - overallConfidence: 0.85---
Data Flow Example
Input: "Can I get a large iced vanilla latte with oat milk?"
Step 1: Normalize
├── Remove fillers: "can I get a large iced vanilla latte with oat milk"
├── Lowercase: "can i get a large iced vanilla latte with oat milk"
└── Output: "large iced vanilla latte oat milk"
Step 2: Intent Classification
├── Pattern match: "can i get" → order intent
└── Output: intent = .order, confidence = 0.95
Step 3: Menu Search (Semantic)
├── Embed query: [0.12, -0.34, 0.56, ...]
├── Search index: cosine similarity
└── Output: "Vanilla Latte" (score: 0.92)
Step 4: Slot Extraction
├── Size: "large" → large (confidence: 0.98)
├── Temperature: "iced" → iced (confidence: 0.97)
├── Milk: "oat milk" → oat (confidence: 0.95)
└── Syrup: "vanilla" → vanilla (confidence: 0.90)
Step 5: Clarification Check
├── All confidences > 0.6 ✓
├── No close alternatives ✓
└── Output: No clarification needed
Step 6: Constraint Validation
├── Vanilla Latte allows large ✓
├── Vanilla Latte allows iced ✓
├── Oat milk available ✓
└── Output: Valid order
Step 7: Build Result
└── VoiceParsedOrder(
itemName: "Vanilla Latte",
size: .large,
temperature: .iced,
milk: .oat,
syrups: ["vanilla"]
)---
File Structure
BWBCore/Sources/BWBCore/Voice/
├── Embeddings/
│ ├── MenuDocument.swift # Document types for search
│ ├── TextEmbedder.swift # NLEmbedding wrapper
│ ├── VectorIndex.swift # In-memory vector search
│ └── MenuEmbeddingService.swift # Main search service
├── Slots/
│ ├── SlotDefinitions.swift # Slot types and patterns
│ ├── SlotPrediction.swift # Prediction types
│ └── SlotClassifier.swift # Extraction logic
├── Dialogue/
│ ├── ClarificationPolicy.swift # When to clarify
│ └── ConfirmationGenerator.swift # Generate confirmations
├── Constraints/
│ ├── ConstraintTypes.swift # YAML data structures
│ ├── ConstraintParser.swift # YAML loading
│ └── YAMLConstraintEngine.swift # Validation engine
├── EnhancedVoiceNLUEngine.swift # Main NLU engine
├── VoiceNLUEngine.swift # Legacy engine (still works)
├── VoiceDialogueManager.swift # State machine
├── SpeechAnalyzerService.swift # iOS 26 speech
└── VoiceTypes.swift # Shared types
BWBCore/Resources/
└── cafe_constraints.yaml # Menu constraints---
Key Types
NLUResult
struct NLUResult {
let intent: VoiceIntent // .order, .modify, .cancel, etc.
let parsedOrders: [VoiceParsedOrder]
let clarificationsNeeded: [ClarificationRequest]
let overallConfidence: Double
let rawTranscript: String
}VoiceParsedOrder
struct VoiceParsedOrder {
var itemId: String?
var itemName: String
var size: DrinkSize?
var temperature: DrinkTemperature?
var milk: MilkType?
var caffeine: CaffeineOption?
var shots: Int?
var syrups: [String]
var quantity: Int
var confidence: Double
}ClarificationRequest
struct ClarificationRequest {
let slotName: String
let question: String
let options: [String]
let importance: ClarificationImportance // .high, .medium, .low
}---
Configuration
### Confidence Thresholds
| Parameter | Value | Description |
|-----------|-------|-------------|
| Base threshold | 0.6 | Below this, always clarify |
| Gap threshold | 0.1 | If 2nd choice is within 0.1, clarify |
| High-importance threshold | 0.75 | For milk/caffeine slots |
### Search Parameters
| Parameter | Value | Description |
|-----------|-------|-------------|
| Top-K results | 5 | Number of candidates to consider |
| Semantic weight | 0.6 | Weight for embedding similarity |
| Fuzzy weight | 0.4 | Weight for text matching |
| Minimum score | 0.3 | Below this, no match |
---
Usage in App
POS Kiosk Integration
// In KioskOrderingView.swift
@StateObject private var nluEngine = EnhancedVoiceNLUEngine()
func handleTranscript(_ transcript: String) async {
let result = await nluEngine.process(transcript: transcript)
if !result.clarificationsNeeded.isEmpty {
// Ask for clarification
showClarification(result.clarificationsNeeded.first!)
} else if let order = result.parsedOrders.first {
// Add to cart
addToCart(order)
}
}Switching from Legacy Engine
// Old way (still works)
let legacyEngine = VoiceNLUEngine.shared
let result = await legacyEngine.process(transcript: text)
// New way (recommended)
let enhancedEngine = EnhancedVoiceNLUEngine()
await enhancedEngine.initialize()
let result = await enhancedEngine.process(transcript: text)---
Performance
| Operation | Time | Notes |
|---|---|---|
| Initialize embeddings | ~500ms | Do once at startup |
| Process transcript | <200ms | Full NLU pipeline |
| Semantic search | <50ms | SIMD optimized |
| Constraint validation | <10ms | In-memory |
---
Troubleshooting
### Low Match Accuracy
- Check if menu items have aliases in `MenuDocument.defaultMenuItems`
- Verify embeddings loaded: `embeddingService.isLoaded`
- Try lowering minimum score threshold
### Slow Performance
- Ensure embeddings are cached (not recomputed each time)
- Check VectorIndex size isn't too large (>10K items)
### Incorrect Constraints
- Verify `cafe_constraints.yaml` is in bundle
- Check item ID normalization (lowercase, no spaces)
- Review constraint priority order
---
Learning System
The system learns from barista corrections to improve accuracy over time.
Location: `BWBCore/Sources/BWBCore/Voice/Learning/`
How It Works
Customer: "large oatmeal latte"
↓
System: "large ??? latte" (milk unknown)
↓
Barista corrects → "oat milk"
↓
FeedbackCollector logs: "oatmeal" → "oat"
↓
After 3 corrections → Suggestion generated
↓
Admin approves → PatternLearner applies
↓
Next time: "oatmeal" recognized as "oat"Components
| File | Purpose |
|---|---|
| `FeedbackCollector.swift` | Records barista corrections |
| `PatternLearner.swift` | Applies learned aliases |
| `LearningTypes.swift` | Dashboard and config types |
Usage
// Record a correction
FeedbackCollector.shared.recordCorrection(
originalTranscript: "large oatmeal latte",
originalOrder: wrongOrder,
correctedOrder: fixedOrder
)
// Review suggestions
let pending = FeedbackCollector.shared.getPendingSuggestions()
// Approve and apply
FeedbackCollector.shared.approveSuggestion(suggestion.id)
PatternLearner.shared.applyApprovedSuggestions()
// Export learned aliases
let yaml = PatternLearner.shared.exportAsYAML()Data Storage
Documents/voice_feedback/
├── corrections.json # All corrections
├── patterns.json # Detected patterns
├── suggestions.json # Pending suggestions
└── learned_aliases.json # Applied aliases---
Future Improvements
1. Swift-embeddings package - Better accuracy than NLEmbedding
2. Embedding persistence - SQLite/CoreData for faster startup
3. USearch integration - For larger menus (10K+ items)
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
BWB/docs/VOICE_SYSTEM_TECHNICAL_DOCS.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture