Grand Diomande Research · Full HTML Reader

BWB — Voice Ordering System

BWB features a sophisticated voice ordering system that uses on-device speech recognition and semantic NLU to process natural language coffee orders. The system runs entirely on-device for privacy and speed.

Business Systems architecture technical paper candidate score 46 .md

Full Public Reader

BWB — Voice Ordering System

Document ID: BWB-ARCH-003
Version: 1.1.0
Last Updated: 2026-01-16

---

Overview

BWB features a sophisticated voice ordering system that uses on-device speech recognition and semantic NLU to process natural language coffee orders. The system runs entirely on-device for privacy and speed.

---

Architecture Stack

┌─────────────────────────────────────────────────────────────────────┐
│                    VOICE ORDERING PIPELINE                           │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│  iOS SpeechAnalyzer                                                  │
│  On-device transcription (iOS 17+)                                   │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│  EnhancedVoiceNLUEngine                                              │
│  • Normalize transcript                                              │
│  • Detect intent (order/modify/cancel)                               │
│  • Semantic menu matching                                            │
│  • Slot extraction                                                   │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│  VoiceDialogueManager                                                │
│  State machine: LISTEN → PROCESS → CONFIRM                          │
│  Handles clarifications and corrections                              │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│  Order Creation & Routing                                            │
│  Cart addition, checkout flow                                        │
└─────────────────────────────────────────────────────────────────────┘

---

Core Components

1. MenuEmbeddingService

Purpose: Semantic menu matching using vector embeddings

How it works:
- Converts menu items to 300-dimensional vectors using Apple's NLEmbedding
- Stores embeddings in VectorIndex for fast similarity search
- Hybrid search: 60

Key features:
| Feature | Description |
|---------|-------------|
| On-device | No external API calls |
| Alias support | "americano" matches "Iced Americano" |
| Abbreviations | "latte" finds all latte variants |
| Typo tolerance | Fuzzy matching with Levenshtein distance |

Performance:
- Embedding initialization: ~500ms (once at startup)
- Semantic search: <50ms (SIMD optimized)
- Returns top-5 matches with confidence scores

2. SlotClassifier

Purpose: Extract order details from natural language

Slots extracted:

SlotValuesExample
sizesmall, medium, large, xl"large" → size: large (0.98)
temperaturehot, iced, blended"iced" → temp: iced (0.97)
milkwhole, skim, oat, almond, soy, coconut"with oat milk" → milk: oat (0.95)
caffeineregular, decaf, half-caf"decaf" → caffeine: decaf (0.92)
shots1-6"extra shot" → shots: 2 (0.90)
syrupvanilla, caramel, hazelnut, etc."vanilla" → syrup: vanilla (0.88)
quantity1-10"two lattes" → quantity: 2 (0.95)

Output: Each slot has value + confidence score (0.0-1.0)

3. ClarificationPolicy

Purpose: Decide when to ask for confirmation vs. proceed

Thresholds:
| Condition | Action |
|-----------|--------|
| Overall confidence < 0.6 | Ask for full clarification |
| Gap to 2nd choice < 0.1 | Ask to disambiguate |
| Milk/caffeine < 0.75 | Always verify (health/allergy) |

Priority order:
1. High importance: milk, caffeine (allergies)
2. Medium importance: menu item, size
3. Low importance: syrups, extras

4. YAMLConstraintEngine

Purpose: Validate orders against menu rules

Constraint types:
| Type | Example |
|------|---------|
| Size constraints | Espresso: only small/double |
| Temperature | Cappuccino: hot only (no iced) |
| Combinations | No decaf cold brew |
| Seasonal | Pumpkin spice: Sept-Nov only |
| Price modifiers | Oat milk: +$0.75 |

Auto-correction: Invalid orders are fixed with user notification
- "Iced cappuccino" → "Hot cappuccino (iced not available)"

5. EnhancedVoiceNLUEngine

Purpose: Main orchestrator for NLU pipeline

Pipeline steps:
1. Normalize — Clean transcript, expand abbreviations
2. Intent — Detect order/modify/cancel intent
3. Menu Search — Find matching menu items
4. Slot Extraction — Extract customization details
5. Clarification Check — Determine if confirmation needed
6. Constraint Validation — Apply business rules

Output: NLUResult

swift
struct NLUResult {
    let intent: OrderIntent
    let parsedOrders: [VoiceParsedOrder]
    let clarificationsNeeded: [Clarification]
    let overallConfidence: Double
}

---

Processing Example

Input: "Can I get a large iced vanilla latte with oat milk?"

Step 1: Normalize
  Input: "Can I get a large iced vanilla latte with oat milk?"
  Output: "large iced vanilla latte oat milk"

Step 2: Intent Detection
  Intent: .order (confidence: 0.95)

Step 3: Menu Search (Semantic)
  Query: "vanilla latte"
  Match: "Vanilla Latte" (score: 0.92)
  Runner-up: "Hazelnut Latte" (score: 0.71)
  Gap: 0.21 (no clarification needed)

Step 4: Slot Extraction
  - Size: large (confidence: 0.98)
  - Temperature: iced (confidence: 0.97)
  - Milk: oat (confidence: 0.95)
  - Syrup: vanilla (confidence: 0.90)

Step 5: Clarification Check
  All slots > 0.6 threshold
  No clarifications needed

Step 6: Constraint Validation
  - Large iced latte: VALID
  - Oat milk: VALID (+$0.75)
  - Vanilla syrup: VALID

Final Output:
  VoiceParsedOrder {
    menuItem: "Vanilla Latte"
    customizations: [large, iced, oat milk, vanilla]
    price: $5.75 + $0.75 = $6.50
    confidence: 0.93
  }

---

Dialogue States

┌─────────────────────────────────────────────────────────────────────┐
│                    DIALOGUE STATE MACHINE                            │
└─────────────────────────────────────────────────────────────────────┘

                    ┌─────────┐
                    │  IDLE   │
                    └────┬────┘
                         │ User speaks
                         ▼
                    ┌─────────┐
                    │ LISTEN  │◀──────────────────┐
                    └────┬────┘                   │
                         │ Transcript ready        │
                         ▼                        │
                    ┌─────────┐                   │
                    │ PROCESS │                   │
                    └────┬────┘                   │
                         │                        │
            ┌────────────┼────────────┐           │
            ▼            ▼            ▼           │
     ┌──────────┐ ┌──────────┐ ┌──────────┐      │
     │ CLARIFY  │ │ CONFIRM  │ │  ERROR   │      │
     └────┬─────┘ └────┬─────┘ └────┬─────┘      │
          │            │            │            │
          └────────────┴────────────┘            │
                       │                         │
                       │ User responds           │
                       └─────────────────────────┘
                       │
                       ▼
                  ┌─────────┐
                  │ COMPLETE│──▶ Add to Cart
                  └─────────┘

---

Four-Corner Voice Station Architecture

Flagship Design: Four voice stations positioned at cardinal points, eliminating traditional counter queuing.

                    VOICE STATION ALPHA (North)
                    ┌─────────────────┐
                    │  Premium Study  │
                    │     Zone        │
                    └────────┬────────┘
                             │
    ┌────────────────────────┼────────────────────────┐
    │                        │                        │
VOICE STATION               │                   VOICE STATION
   DELTA ◄──────────────────┼─────────────────────► BETA
  (West)                    │                       (East)
┌──────────┐      ┌─────────┴─────────┐      ┌──────────┐
│  Quiet   │      │                   │      │  Social  │
│  Focus   │      │   DANCE FLOOR     │      │Interaction│
│  Zone    │      │                   │      │  Zone    │
└──────────┘      └─────────┬─────────┘      └──────────┘
                            │
                    ┌───────┴───────┐
                    │  Community    │
                    │   Hub Zone    │
                    └───────────────┘
                    VOICE STATION CHARLIE (South)

Station Specifications

StationLocationPrimary ZoneBackup RoutingWall Integration
AlphaNorth WallPremium StudyQuiet Focus24-seat zone, $15/hr
BetaEast WallSocialCommunity18-seat zone, $12/hr
CharlieSouth WallCommunitySocial20-seat zone, $10/hr
DeltaWest WallQuiet FocusPremium Study18-seat zone, $12/hr

Station Technology Specs

MetricSpecification
Recognition Accuracy95-97
Conversation StyleNatural language
Noise CancellationAdaptive, zone-aware
LanguagesEnglish (Spanish, French planned)
Order ConfirmationVoice + visual display
DeviceiPad Pro with external mic array

Spatial Intelligence Routing

The voice system uses spatial awareness to optimize customer flow:

1. Position Detection: Identify customer's location in space
2. Nearest Station: Route to closest available voice station
3. Volume Adaptation: Adjust based on ambient noise level
4. Zone Awareness: Consider seating zone acoustics
5. POS Coordination: Route order to appropriate cart

ROUTING ALGORITHM
─────────────────
Customer enters → Position detected → Nearest station identified
                                            │
                                            ▼
                              ┌─────────────────────────┐
                              │   ROUTING FACTORS       │
                              ├─────────────────────────┤
                              │ Queue length: 30-50%    │
                              │ Order complexity: 20-40%│
                              │ Customer proximity: 10-30% │
                              │ Barista efficiency: 10-20% │
                              └─────────────────────────┘
                                            │
                                            ▼
                              Order routed to optimal cart

Integration with 4-Wall Seating

Each voice station is positioned to serve its adjacent seating zone:

StationServes ZoneAcoustic Considerations
AlphaPremium Study (North)Low noise, private ordering
BetaSocial (East)Group orders, louder ambient
CharlieCommunity (South)High traffic, quick ordering
DeltaQuiet Focus (West)Very low noise, discrete confirmation

See [FLAGSHIP_DESIGN](FLAGSHIP_DESIGN.md) for complete zone specifications.

---

Learning System

FeedbackCollector: Records barista corrections
PatternLearner: Learns aliases from corrections
Approval flow: Admin reviews → applies learned patterns

Example:
- Customer says "skinny latte"
- Barista corrects to "Skim Milk Latte"
- System learns: "skinny" → skim milk alias
- Next time: automatic mapping

---

Performance Metrics

OperationTargetActual
Initialize embeddings<1s~500ms
Full NLU pipeline<500ms<200ms
Semantic search<100ms<50ms
Constraint validation<50ms<10ms
Speech-to-textReal-timeiOS native

---

File Structure

BWBCore/Sources/BWBCore/Voice/
├── Embeddings/
│   ├── MenuDocument.swift      # Document representation
│   ├── TextEmbedder.swift      # Apple NLEmbedding wrapper
│   ├── VectorIndex.swift       # Similarity search
│   └── MenuEmbeddingService.swift
├── Slots/
│   ├── SlotDefinitions.swift   # Slot types and values
│   ├── SlotPrediction.swift    # Prediction output
│   └── SlotClassifier.swift    # Extraction logic
├── Dialogue/
│   ├── ClarificationPolicy.swift
│   └── ConfirmationGenerator.swift
├── Constraints/
│   ├── ConstraintTypes.swift
│   ├── ConstraintParser.swift
│   └── YAMLConstraintEngine.swift
├── EnhancedVoiceNLUEngine.swift
├── VoiceDialogueManager.swift
├── SpeechAnalyzerService.swift
└── VoiceTypes.swift

---

Change Log

VersionDateChanges
1.1.02026-01-17Added 4-corner station architecture with Alpha/Beta/Charlie/Delta stations. Added spatial intelligence routing. Added 4-wall zone integration.
1.0.02026-01-16Initial creation

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

spine/BWB/architecture/VOICE_SYSTEM.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture