BWB — Voice Ordering System
BWB features a sophisticated voice ordering system that uses on-device speech recognition and semantic NLU to process natural language coffee orders. The system runs entirely on-device for privacy and speed.
Full Public Reader
BWB — Voice Ordering System
Document ID: BWB-ARCH-003
Version: 1.1.0
Last Updated: 2026-01-16
---
Overview
BWB features a sophisticated voice ordering system that uses on-device speech recognition and semantic NLU to process natural language coffee orders. The system runs entirely on-device for privacy and speed.
---
Architecture Stack
┌─────────────────────────────────────────────────────────────────────┐
│ VOICE ORDERING PIPELINE │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ iOS SpeechAnalyzer │
│ On-device transcription (iOS 17+) │
└──────────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ EnhancedVoiceNLUEngine │
│ • Normalize transcript │
│ • Detect intent (order/modify/cancel) │
│ • Semantic menu matching │
│ • Slot extraction │
└──────────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ VoiceDialogueManager │
│ State machine: LISTEN → PROCESS → CONFIRM │
│ Handles clarifications and corrections │
└──────────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Order Creation & Routing │
│ Cart addition, checkout flow │
└─────────────────────────────────────────────────────────────────────┘---
Core Components
1. MenuEmbeddingService
Purpose: Semantic menu matching using vector embeddings
How it works:
- Converts menu items to 300-dimensional vectors using Apple's NLEmbedding
- Stores embeddings in VectorIndex for fast similarity search
- Hybrid search: 60
Key features:
| Feature | Description |
|---------|-------------|
| On-device | No external API calls |
| Alias support | "americano" matches "Iced Americano" |
| Abbreviations | "latte" finds all latte variants |
| Typo tolerance | Fuzzy matching with Levenshtein distance |
Performance:
- Embedding initialization: ~500ms (once at startup)
- Semantic search: <50ms (SIMD optimized)
- Returns top-5 matches with confidence scores
2. SlotClassifier
Purpose: Extract order details from natural language
Slots extracted:
| Slot | Values | Example |
|---|---|---|
| size | small, medium, large, xl | "large" → size: large (0.98) |
| temperature | hot, iced, blended | "iced" → temp: iced (0.97) |
| milk | whole, skim, oat, almond, soy, coconut | "with oat milk" → milk: oat (0.95) |
| caffeine | regular, decaf, half-caf | "decaf" → caffeine: decaf (0.92) |
| shots | 1-6 | "extra shot" → shots: 2 (0.90) |
| syrup | vanilla, caramel, hazelnut, etc. | "vanilla" → syrup: vanilla (0.88) |
| quantity | 1-10 | "two lattes" → quantity: 2 (0.95) |
Output: Each slot has value + confidence score (0.0-1.0)
3. ClarificationPolicy
Purpose: Decide when to ask for confirmation vs. proceed
Thresholds:
| Condition | Action |
|-----------|--------|
| Overall confidence < 0.6 | Ask for full clarification |
| Gap to 2nd choice < 0.1 | Ask to disambiguate |
| Milk/caffeine < 0.75 | Always verify (health/allergy) |
Priority order:
1. High importance: milk, caffeine (allergies)
2. Medium importance: menu item, size
3. Low importance: syrups, extras
4. YAMLConstraintEngine
Purpose: Validate orders against menu rules
Constraint types:
| Type | Example |
|------|---------|
| Size constraints | Espresso: only small/double |
| Temperature | Cappuccino: hot only (no iced) |
| Combinations | No decaf cold brew |
| Seasonal | Pumpkin spice: Sept-Nov only |
| Price modifiers | Oat milk: +$0.75 |
Auto-correction: Invalid orders are fixed with user notification
- "Iced cappuccino" → "Hot cappuccino (iced not available)"
5. EnhancedVoiceNLUEngine
Purpose: Main orchestrator for NLU pipeline
Pipeline steps:
1. Normalize — Clean transcript, expand abbreviations
2. Intent — Detect order/modify/cancel intent
3. Menu Search — Find matching menu items
4. Slot Extraction — Extract customization details
5. Clarification Check — Determine if confirmation needed
6. Constraint Validation — Apply business rules
Output: NLUResult
struct NLUResult {
let intent: OrderIntent
let parsedOrders: [VoiceParsedOrder]
let clarificationsNeeded: [Clarification]
let overallConfidence: Double
}---
Processing Example
Input: "Can I get a large iced vanilla latte with oat milk?"
Step 1: Normalize
Input: "Can I get a large iced vanilla latte with oat milk?"
Output: "large iced vanilla latte oat milk"
Step 2: Intent Detection
Intent: .order (confidence: 0.95)
Step 3: Menu Search (Semantic)
Query: "vanilla latte"
Match: "Vanilla Latte" (score: 0.92)
Runner-up: "Hazelnut Latte" (score: 0.71)
Gap: 0.21 (no clarification needed)
Step 4: Slot Extraction
- Size: large (confidence: 0.98)
- Temperature: iced (confidence: 0.97)
- Milk: oat (confidence: 0.95)
- Syrup: vanilla (confidence: 0.90)
Step 5: Clarification Check
All slots > 0.6 threshold
No clarifications needed
Step 6: Constraint Validation
- Large iced latte: VALID
- Oat milk: VALID (+$0.75)
- Vanilla syrup: VALID
Final Output:
VoiceParsedOrder {
menuItem: "Vanilla Latte"
customizations: [large, iced, oat milk, vanilla]
price: $5.75 + $0.75 = $6.50
confidence: 0.93
}---
Dialogue States
┌─────────────────────────────────────────────────────────────────────┐
│ DIALOGUE STATE MACHINE │
└─────────────────────────────────────────────────────────────────────┘
┌─────────┐
│ IDLE │
└────┬────┘
│ User speaks
▼
┌─────────┐
│ LISTEN │◀──────────────────┐
└────┬────┘ │
│ Transcript ready │
▼ │
┌─────────┐ │
│ PROCESS │ │
└────┬────┘ │
│ │
┌────────────┼────────────┐ │
▼ ▼ ▼ │
┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ CLARIFY │ │ CONFIRM │ │ ERROR │ │
└────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │
└────────────┴────────────┘ │
│ │
│ User responds │
└─────────────────────────┘
│
▼
┌─────────┐
│ COMPLETE│──▶ Add to Cart
└─────────┘---
Four-Corner Voice Station Architecture
Flagship Design: Four voice stations positioned at cardinal points, eliminating traditional counter queuing.
VOICE STATION ALPHA (North)
┌─────────────────┐
│ Premium Study │
│ Zone │
└────────┬────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
VOICE STATION │ VOICE STATION
DELTA ◄──────────────────┼─────────────────────► BETA
(West) │ (East)
┌──────────┐ ┌─────────┴─────────┐ ┌──────────┐
│ Quiet │ │ │ │ Social │
│ Focus │ │ DANCE FLOOR │ │Interaction│
│ Zone │ │ │ │ Zone │
└──────────┘ └─────────┬─────────┘ └──────────┘
│
┌───────┴───────┐
│ Community │
│ Hub Zone │
└───────────────┘
VOICE STATION CHARLIE (South)Station Specifications
| Station | Location | Primary Zone | Backup Routing | Wall Integration |
|---|---|---|---|---|
| Alpha | North Wall | Premium Study | Quiet Focus | 24-seat zone, $15/hr |
| Beta | East Wall | Social | Community | 18-seat zone, $12/hr |
| Charlie | South Wall | Community | Social | 20-seat zone, $10/hr |
| Delta | West Wall | Quiet Focus | Premium Study | 18-seat zone, $12/hr |
Station Technology Specs
| Metric | Specification |
|---|---|
| Recognition Accuracy | 95-97 |
| Conversation Style | Natural language |
| Noise Cancellation | Adaptive, zone-aware |
| Languages | English (Spanish, French planned) |
| Order Confirmation | Voice + visual display |
| Device | iPad Pro with external mic array |
Spatial Intelligence Routing
The voice system uses spatial awareness to optimize customer flow:
1. Position Detection: Identify customer's location in space
2. Nearest Station: Route to closest available voice station
3. Volume Adaptation: Adjust based on ambient noise level
4. Zone Awareness: Consider seating zone acoustics
5. POS Coordination: Route order to appropriate cart
ROUTING ALGORITHM
─────────────────
Customer enters → Position detected → Nearest station identified
│
▼
┌─────────────────────────┐
│ ROUTING FACTORS │
├─────────────────────────┤
│ Queue length: 30-50% │
│ Order complexity: 20-40%│
│ Customer proximity: 10-30% │
│ Barista efficiency: 10-20% │
└─────────────────────────┘
│
▼
Order routed to optimal cartIntegration with 4-Wall Seating
Each voice station is positioned to serve its adjacent seating zone:
| Station | Serves Zone | Acoustic Considerations |
|---|---|---|
| Alpha | Premium Study (North) | Low noise, private ordering |
| Beta | Social (East) | Group orders, louder ambient |
| Charlie | Community (South) | High traffic, quick ordering |
| Delta | Quiet Focus (West) | Very low noise, discrete confirmation |
See [FLAGSHIP_DESIGN](FLAGSHIP_DESIGN.md) for complete zone specifications.
---
Learning System
FeedbackCollector: Records barista corrections
PatternLearner: Learns aliases from corrections
Approval flow: Admin reviews → applies learned patterns
Example:
- Customer says "skinny latte"
- Barista corrects to "Skim Milk Latte"
- System learns: "skinny" → skim milk alias
- Next time: automatic mapping
---
Performance Metrics
| Operation | Target | Actual |
|---|---|---|
| Initialize embeddings | <1s | ~500ms |
| Full NLU pipeline | <500ms | <200ms |
| Semantic search | <100ms | <50ms |
| Constraint validation | <50ms | <10ms |
| Speech-to-text | Real-time | iOS native |
---
File Structure
BWBCore/Sources/BWBCore/Voice/
├── Embeddings/
│ ├── MenuDocument.swift # Document representation
│ ├── TextEmbedder.swift # Apple NLEmbedding wrapper
│ ├── VectorIndex.swift # Similarity search
│ └── MenuEmbeddingService.swift
├── Slots/
│ ├── SlotDefinitions.swift # Slot types and values
│ ├── SlotPrediction.swift # Prediction output
│ └── SlotClassifier.swift # Extraction logic
├── Dialogue/
│ ├── ClarificationPolicy.swift
│ └── ConfirmationGenerator.swift
├── Constraints/
│ ├── ConstraintTypes.swift
│ ├── ConstraintParser.swift
│ └── YAMLConstraintEngine.swift
├── EnhancedVoiceNLUEngine.swift
├── VoiceDialogueManager.swift
├── SpeechAnalyzerService.swift
└── VoiceTypes.swift---
Change Log
| Version | Date | Changes |
|---|---|---|
| 1.1.0 | 2026-01-17 | Added 4-corner station architecture with Alpha/Beta/Charlie/Delta stations. Added spatial intelligence routing. Added 4-wall zone integration. |
| 1.0.0 | 2026-01-16 | Initial creation |
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
spine/BWB/architecture/VOICE_SYSTEM.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture