Grand Diomande Research · Full HTML Reader

BWB Kiosk — Voice Ordering Architecture

> *"Break every component down to its grills... define a subsection and a sub-subsection that further builds upon the previous section, then expands it in a recursive manner."*

Business Systems architecture technical paper candidate score 74 .md

Full Public Reader

# BWB Kiosk — Voice Ordering Architecture
### Deep Recursive Decomposition & Evolutionary Design
v2.0 — February 10, 2026 — Verified against codebase

---

> "Break every component down to its grills... define a subsection and a sub-subsection that further builds upon the previous section, then expands it in a recursive manner."

---

1. [Vision & Philosophy](#1-vision--philosophy)
2. [System Topology](#2-system-topology)
3. [Layer 1: Audio Foundation](#3-layer-1-audio-foundation)
4. [Layer 2: Speech-to-Text Pipeline](#4-layer-2-speech-to-text-pipeline)
5. [Layer 3: Natural Language Understanding](#5-layer-3-natural-language-understanding)
6. [Layer 4: Dialogue Engine](#6-layer-4-dialogue-engine)
7. [Layer 5: Order State Machine](#7-layer-5-order-state-machine)
8. [Layer 6: Synthesis & Feedback](#8-layer-6-synthesis--feedback)
9. [Layer 7: Interaction Surface](#9-layer-7-interaction-surface)
10. [Layer 8: Learning & Telemetry](#10-layer-8-learning--telemetry)
11. [Cross-Cutting: Error Taxonomy & Recovery](#11-cross-cutting-error-taxonomy--recovery)
12. [Cross-Cutting: Performance & Latency Budget](#12-cross-cutting-performance--latency-budget)
13. [Cross-Cutting: Privacy, Security, Offline](#13-cross-cutting-privacy-security-offline)
14. [State Machine Formal Specification](#14-state-machine-formal-specification)
15. [End-to-End Flow Traces](#15-end-to-end-flow-traces)
16. [Gap Analysis: Current vs Next-Gen](#16-gap-analysis-current-vs-next-gen)
17. [Evolution Roadmap](#17-evolution-roadmap)
18. [Codebase Map (Verified)](#18-codebase-map-verified)

---

1. Vision & Philosophy

### 1.1 Core Vision
A voice ordering system that feels like the best barista conversation — one that understands messy human speech, self-corrections, contextual references, and emotional undertones — delivered at kiosk speed with zero training required.

1.2 Design Principles

#### 1.2.1 Conversational Intelligence
##### [ip] Natural Speech Tolerance
- Disfluency handling: "um", "uh", "like", "so" stripped by `TranscriptNormalizer` (228 LOC)
- Self-correction: "large, no wait, medium" — correction markers detected: "actually", "scratch that", "wait no", "I mean", "not that"
- Partial utterances: Streaming partial transcripts handled by `TranscriptPipeline` (188 LOC) — deduplicate "med" → "medi" → "medium"
- Run-on orders: "a latte and two cappuccinos and oh also a muffin" — multi-item extraction via `EntityExtractor`

##### [ip] Contextual Inference
- Pronoun resolution (future): "Make it bigger" = upsize current item
- Ellipsis completion (future): "Same thing" = repeat last order
- Implicit slots: "And a cappuccino" after "large oat milk latte" → infer large? oat milk? (currently: no inference, defaults used)
- Session memory: Last 10 utterances tracked in `sessionHistory[]` for context-aware parsing

##### [ip] Proactive Disambiguation
- Confidence-gated clarification: `ClarificationPolicy` (336 LOC) with three tiers:
- `strict` config: threshold 0.7, gap 0.15, max 3 clarifications/turn
- `default` config: threshold 0.6, gap 0.1, max 2/turn
- `lenient` config: threshold 0.45, gap 0.05, max 1/turn
- High-importance slots: `milk` and `caffeine` get elevated thresholds (0.75 vs 0.6) due to allergen/health implications
- Menu item confirmation: If confidence < threshold, ask "Did you mean a [item]?"

##### [ip] Multi-Turn Memory
- `sessionHistory: [String]` — last 10 utterances preserved
- `sessionPreferences: [String: String]` — learned preferences (e.g., "milk" → "oat")
- `lastClarification: (question: String, answer: String)?` — recent Q&A context injected into next AI parse
- `consecutiveFailures: Int` — drives recovery escalation level

#### 1.2.2 Zero-Friction Interaction
##### [ip] First-Time Success
- No onboarding, no tutorial
- Welcome prompt: "What would you like to order? Say 'help' for available commands."
- Example-based help: "To order, just say what you'd like. For example: 'I'll have a medium latte'"

##### [ip] Graceful Modality Switching
- Voice → Touch: "Let me show you the menu. You can also tap items to order." (after 3 failures)
- `KioskTouchOrderingView` always accessible via button
- `showTouchOrdering` state triggers `.fullScreenCover`

##### [ip] Speed Budget
- Target: End-of-speech → confirmation response < 2.0 seconds
- Breakdown: Utterance detection (≤1.5s) + NLU (≤100ms) + AI (≤1.5s parallel) + TTS start (≤500ms)
- Current: Hybrid parse = AI + NLU in parallel, fastest wins or merge

#### 1.2.3 Acoustic Resilience
##### [ip] Noise Handling
- Audio mode: `.voiceChat` for noise suppression + `.measurement` for speech recognition
- Buffer: 16kHz sample rate, 5ms IO buffer, mono channel
- VAD: Energy-based with adaptive threshold (currently basic)

##### [ip] Echo Prevention (CRITICAL)
- Current: `speakWithIsolation()` — pause mic → play TTS → resume mic
- Actual delay chain: 100ms cleanup + TTS duration + 300ms echo dissipation + 100ms stabilization
- Total overhead per TTS: ~500ms + TTS duration
- Gap: No AEC — simultaneous talk/listen impossible

---

2. System Topology

2.1 Architecture Layers (with verified file counts)

┌─────────────────────────────────────────────────────────────────────┐
│  LAYER 7: INTERACTION SURFACE                                        │
│  BWB_Kiosk/Views/ (7 files, ~1200 LOC kiosk-specific)               │
├─────────────────────────────────────────────────────────────────────┤
│  LAYER 6: SYNTHESIS & FEEDBACK                                       │
│  AudioSessionManager (373 LOC) + FeedbackCoordinator (358 LOC)      │
├─────────────────────────────────────────────────────────────────────┤
│  LAYER 5: ORDER STATE MACHINE                                        │
│  CartCoordinator (452) + ConfirmationCoord (626) + SessionMgr (270) │
├─────────────────────────────────────────────────────────────────────┤
│  LAYER 4: DIALOGUE ENGINE                                            │
│  VoiceDialogueManager (572) + ClarificationPolicy (336)             │
│  + ConfirmationGenerator (334) + ContextAwareRecovery (476)          │
├─────────────────────────────────────────────────────────────────────┤
│  LAYER 3: NATURAL LANGUAGE UNDERSTANDING                             │
│  OrderParsingPipeline (531) + AITranscriptParser (1343)              │
│  + VoiceNLUEngine (1321) + EnhancedNLU (745) + ConstraintEngine(776)│
│  + IntentClassifier (186) + EntityExtractor (314)                    │
│  + SlotClassifier (362) + Embeddings/ (4 files, ~1400 LOC)          │
├─────────────────────────────────────────────────────────────────────┤
│  LAYER 2: SPEECH-TO-TEXT PIPELINE                                    │
│  TranscriptPipeline (188) + TranscriptNormalizer (228)               │
│  + TranscriptState (136) + UtteranceCompletionDetector (344)         │
│  + TranscriptStabilityTracker (125) + TranscriptPreprocessor (226)   │
├─────────────────────────────────────────────────────────────────────┤
│  LAYER 1: AUDIO FOUNDATION                                          │
│  SpeechAnalyzerService (575) + LegacyVoiceService (373)             │
│  + VoiceServiceProtocol (300) + WakeWordDetector (540)               │
│  + AudioSessionManager (373)                                         │
├─────────────────────────────────────────────────────────────────────┤
│  LAYER 8: LEARNING & TELEMETRY (cross-cutting)                      │
│  PatternLearner (318) + FeedbackCollector (512)                      │
│  + LearningTypes (179)                                               │
└─────────────────────────────────────────────────────────────────────┘

TOTAL: ~19,700 LOC across 50 Swift files in BWBCore/Voice/
     + ~1,400 LOC kiosk-specific in BWB_Kiosk/

### 2.2 The Orchestrator (Verified)
File: `BWB_Kiosk/Services/VoiceOrderingOrchestrator.swift` (~750 LOC)

The orchestrator is a `@MainActor` singleton that:
1. Owns 8 component instances (injected in `init`):
- `TranscriptPipeline`
- `LiveOrderPreviewGenerator`
- `UtteranceCompletionDetector`
- `OrderParsingPipeline(strategy: .hybrid)`
- `CartCoordinator`
- `ConfirmationCoordinator`
- `FeedbackCoordinator`
- `SessionManager`
2. Publishes 25+ `@Published` properties for UI binding
3. Coordinates the voice service (`VoiceServiceProtocol`) via async streams
4. Manages wake word detection via `WakeWordDetector`

2.3 Data Flow (Verified End-to-End)

🎤 Hardware Microphone
  │
  ▼
AVAudioSession (.playAndRecord, .voiceChat, 16kHz)
  │
  ├──── VoiceServiceProtocol.startListening() ──────┐
  │     ├── SpeechAnalyzerService (iOS 26+)          │
  │     └── LegacyVoiceService (SFSpeechRecognizer)  │
  │                                                   │
  │     Two AsyncStreams produced:                     │
  │     ├── transcriptionResults → VoiceTranscriptionResult
  │     │     { transcript: String, isFinal: Bool, confidence: Float? }
  │     │
  │     └── voiceActivityResults → VoiceActivityResult
  │           { isSpeechDetected: Bool, audioLevel: Float? }
  │
  ▼
handleTranscriptionResult(_:)
  │
  ├── 1. transcriptPipeline.processIncoming(text, isFinal)
  │      → Updates displayTranscript (@Published)
  │
  ├── 2. livePreviewGenerator.generatePreview(from: text)
  │      → Updates livePreviews: [BWBCore.OrderPreview] (@Published)
  │
  ├── 3. utteranceDetector.updateTranscript(text)
  │      → Resets stability count if changed
  │
  └── 4. utteranceDetector.analyze(transcript:, isSpeechDetected:, isListening:, ...)
         → Returns UtteranceAnalysis { isComplete, confidence, reason, countdown }
         │
         ├── NOT complete → update processingCountdown (@Published)
         │
         └── COMPLETE → processUtterance(transcript)
                │
                ├── Deduplicate (skip if == lastProcessedTranscript)
                ├── addToSessionHistory(transcript)
                │
                ├── [IF confirming] → handleConfirmationResponse(transcript)
                │   └── confirmationCoordinator.processResponse(text)
                │       → .confirmed / .rejected / .modified / .additionalOrder / .unclear
                │
                └── [ELSE] → parsingPipeline.parseWithAutoContext(
                              transcript,
                              cartItems: cartItems,
                              sessionHistory: sessionHistory,
                              userPreferences: sessionPreferences,
                              lastClarification: lastClarification,
                              menuItems: nil
                            )
                    │
                    ▼
                  OrderParseResult { transcript, intent, items, confidence, source }
                    │
                    ├── isMetaCommand? → handleMetaCommand(result)
                    │   └── .clearOrder / .readCart / .help / .confirm / .decline / .checkout
                    │       → cartCoordinator or speakWithIsolation
                    │
                    ├── has items? → cartCoordinator.addToPending(validatedItems)
                    │   → confirmationCoordinator.startConfirmation(items, transcript, confidence)
                    │   → speakWithIsolation(confirmationMessage, thenListen: true)
                    │
                    └── no items? → ContextAwareRecoveryService.generateRecovery(
                                      type: .unknownItem,
                                      context: RecoveryContext { cart, history, failed, prefs, failureCount }
                                    )
                        → speakWithIsolation(recoveryMessage, thenListen: shouldContinue)

---

3. Layer 1: Audio Foundation

### 3.1 Audio Session Management
File: `BWBCore/Voice/AudioSessionManager.swift` (373 LOC)
Class: `AudioSessionManager` — `@MainActor`, singleton, `ObservableObject`

3.1.1 Audio Session States (Verified Enum)

swift

public enum AudioSessionState: Sendable {
    case idle           // No audio activity
    case listening      // Microphone active for speech recognition
    case speaking       // TTS playback active
    case transitioning  // Switching between modes
}

3.1.2 AVAudioSession Configurations (Verified)

STATE: idle
  Category: .playAndRecord
  Mode: .voiceChat
  Options: [.defaultToSpeaker, .allowBluetooth]

STATE: listening
  Category: .playAndRecord
  Mode: .measurement          ← Optimized for recognition accuracy
  Options: [.defaultToSpeaker, .allowBluetooth, .duckOthers]
  Active: true (.notifyOthersOnDeactivation)

STATE: speaking
  Category: .playback          ← Output only, no mic
  Mode: .spokenAudio           ← Optimized for voice TTS
  Options: [.duckOthers]
  Active: true (.notifyOthersOnDeactivation)

STATE: transitioning
  No changes (keep current config)

3.1.3 Voice Service Callback Registration (Verified)

swift

public func registerVoiceServiceCallbacks(
    pause: @escaping () async -> Void,      // voiceService.suspend()
    resume: @escaping () async throws -> Void  // voiceService.resume()
)

Registered by `VoiceOrderingOrchestrator.registerAudioSessionCallbacks()` at init.

3.1.4 Pause/Resume Lifecycle (Verified Timing)

pauseListening():
  guard !isListeningPaused && !isTransitioning
  → state = .transitioning
  → await pauseCallback()              // voiceService.suspend()
  → sleep(100ms)                       // Audio engine cleanup
  → configureAudioSession(for: .speaking)  // Switch to playback
  → state = .speaking

resumeListening():
  guard isListeningPaused && !isTransitioning
  → state = .transitioning
  → sleep(postTTSListeningDelay: 300ms)    // Echo dissipation
  → configureAudioSession(for: .listening)  // Switch to recording
  → sleep(100ms)                            // Audio session stabilize
  → try await resumeCallback()              // voiceService.resume()
  → state = .listening

Total isolation overhead: 100ms + TTS duration + 300ms + 100ms = 500ms + TTS duration

3.1.5 TTS Configuration (Verified)

swift

voiceIdentifier: String?           // nil = system default
speechRate: Float = 0.5            // 0.0-1.0
pitchMultiplier: Float = 1.0
volume: Float = 1.0
preUtteranceDelay: TimeInterval = 0.15
postUtteranceDelay: TimeInterval = 0.2
postTTSListeningDelay: TimeInterval = 0.3  // After TTS, before mic resume

Voice selection priority (verified):
1. Custom `voiceIdentifier` if set
2. Enhanced quality US English voice: `quality == .enhanced && language == "en-US"`
3. Standard US English: `AVSpeechSynthesisVoice(language: "en-US")`

### 3.2 Voice Service Protocol
File: `BWBCore/Voice/VoiceServiceProtocol.swift` (300 LOC)

3.2.1 Protocol (Verified)

swift

public protocol VoiceServiceProtocol: Sendable {
    func startListening() async throws
    func stopListening() async
    func suspend() async
    func resume() async throws

    var transcriptionResults: AsyncStream<VoiceTranscriptionResult> { get }
    var voiceActivityResults: AsyncStream<VoiceActivityResult> { get }
}

3.2.2 Result Types (Verified)

swift

public struct VoiceTranscriptionResult: Sendable {
    public let transcript: String
    public let isFinal: Bool
    public let confidence: Float?
    public let timestamp: Date
    // + segments, language, alternatives
}

public struct VoiceActivityResult: Sendable {
    public let isSpeechDetected: Bool
    public let audioLevel: Float?
    public let timestamp: Date
}

3.2.3 Implementation Selection (Verified)

swift

// In VoiceOrderingOrchestrator.startListening():
if #available(iOS 26.0, *) {
    voiceService = SpeechAnalyzerService()   // 575 LOC
    isUsingEnhancedService = true
} else {
    voiceService = LegacyVoiceService()      // 373 LOC
    isUsingEnhancedService = false
}

##### [ip] SpeechAnalyzerService (iOS 26+, 575 LOC)
- Uses new `SpeechAnalyzer` API
- On-device ML inference
- Lower latency, better accuracy
- Supports asset download for offline use
- `downloadAssetsIfNeeded()` exposed to UI

##### [ip] LegacyVoiceService (iOS 17+, 373 LOC)
- Uses `SFSpeechRecognizer` + `SFSpeechAudioBufferRecognitionRequest`
- Network-dependent for best accuracy
- Has on-device fallback (lower accuracy)
- Well-tested, stable

### 3.3 Wake Word Detection
File: `BWBCore/Voice/WakeWordDetector.swift` (540 LOC)

3.3.1 State Machine (Verified)

DISABLED ──enable()──▶ ENABLED ──startListening()──▶ LISTENING
                                                        │
                                                   onWakeWordDetected
                                                        │
                                                        ▼
                                                    PAUSED (during session)
                                                        │
                                                   resume()
                                                        │
                                                        ▼
                                                    LISTENING

3.3.2 Detection Callback (Verified)

swift

wakeWordDetector.onWakeWordDetected = { [weak self] word in
    Task { @MainActor in
        Logger.voice.info("Wake word '\(word)' detected")
        VoiceAudioFeedback.wakeWordDetected.play()  // System sound 1057 (Tink)
        VoiceHapticFeedback.wakeWord.play()          // Haptics.success()
        await self?.startSession()
    }
}

#### 3.3.3 Session Integration
- `startSession()` calls `wakeWordDetector.pause()` → `isWakeWordListening = false`
- `endSession()` calls `wakeWordDetector.resume()` → `isWakeWordListening = true`
- Always-on when enabled, paused only during active ordering

### 3.4 Silence Detection & Prompting
Implemented in: `VoiceOrderingOrchestrator` (not a separate component)

3.4.1 Timer Logic (Verified)

swift

silencePromptDuration: TimeInterval = 5.0
hasPromptedForSilence: Bool = false

// Trigger conditions (all must be true):
//   phase == .listening
//   displayTranscript.isEmpty
//   !isSpeechDetected
//   !hasPromptedForSilence

3.4.2 Prompt Messages (Verified)

swift

// Cart empty:
"What would you like to order? Say 'help' for available commands."

// Cart has items:
"You have {N} {item/items} in your cart. Would you like anything else,
 or say 'checkout' when ready."

#### 3.4.3 Cancellation
- Any speech detected → `cancelSilencePromptTimer()` + `hasPromptedForSilence = false`
- Session end → cancel + reset

---

4. Layer 2: Speech-to-Text Pipeline

### 4.1 Transcript Pipeline
File: `BWBCore/Voice/Pipeline/TranscriptPipeline.swift` (188 LOC)

#### 4.1.1 Pipeline Responsibilities
1. Receive raw transcript from voice service
2. Normalize (delegated to `TranscriptNormalizer`)
3. Deduplicate repeated partials
4. Track display transcript for UI
5. Record speech activity timestamps

4.1.2 Key Method (Verified)

swift

func processIncoming(_ transcript: String, isFinal: Bool)
  → Updates @Published displayTranscript
  → Calls recordSpeechActivity() on speech input

### 4.2 Transcript Normalization
File: `BWBCore/Voice/Pipeline/TranscriptNormalizer.swift` (228 LOC)

4.2.1 Normalization Steps

1. Lowercase
2. Whitespace normalization (collapse multiple spaces)
3. Filler word removal: um, uh, like, so, you know, basically
4. Correction marker detection (returns flag, doesn't strip)
5. Punctuation cleanup

4.2.2 Correction Markers (Used by Orchestrator)

Correction: "actually", "scratch that", "wait no", "I mean", "not that"
Uncertainty: "maybe", "I think", "not sure"
Priority: "most important", "must have", "critical"
Anti-requirement: "don't want", "avoid", "skip"

### 4.3 Transcript State
File: `BWBCore/Voice/Pipeline/TranscriptState.swift` (136 LOC)

4.3.1 State Types

swift

enum TranscriptState {
    case empty       // No transcript received
    case partial     // Receiving streaming partials
    case stable      // Transcript hasn't changed recently
    case final_      // ASR marked as final
}

### 4.4 Transcript Preprocessing
File: `BWBCore/Voice/Parsing/TranscriptPreprocessor.swift` (226 LOC)

#### 4.4.1 Purpose
Pre-processes transcript BEFORE sending to AI parser:
- Strip filler words
- Expand abbreviations
- Normalize quantities ("a couple" → "2")
- Mark correction segments
- Extract and tag modifiers

### 4.5 Utterance Completion Detection
File: `BWBCore/Voice/Detection/UtteranceCompletionDetector.swift` (344 LOC)

4.5.1 Analysis Result (Verified)

swift

public struct UtteranceAnalysis: Equatable {
    public let isComplete: Bool
    public let confidence: Double
    public let reason: CompletionReason
    public let processingCountdown: TimeInterval?
    public let isWaitingToProcess: Bool
}

4.5.2 Completion Reasons (Verified)

swift

public enum CompletionReason: String, Sendable {
    case notReady               // Not yet complete
    case silenceTimeout         // Silence threshold exceeded
    case stableTranscript       // Transcript stable for required duration
    case orderEndingPhrase      // "please", "thanks", etc.
    case endOfItemKeyword       // Item-ending keyword detected
    case hardTimeout            // Maximum silence timeout reached
    case explicitStop           // User explicitly stopped
    case empty                  // No transcript content
    case alreadyProcessed       // Same as last processed transcript
}

4.5.3 Multi-Signal Analysis (Verified Logic)

analyze(transcript:, isSpeechDetected:, isListening:, isSpeaking:, isProcessing:)

Guard checks (return NOT_COMPLETE if):
  - transcript is empty → .empty
  - transcript == lastProcessedTranscript → .alreadyProcessed
  - isSpeaking (TTS playing) → .notReady
  - isProcessing → .notReady
  - !isListening → .notReady

Signal 1: Transcript stability
  → stabilityTracker.isStable(requiredCount: 1, interval: 0.3s)

Signal 2: Silence detection
  → !isSpeechDetected && timeSinceLastSpeech > silenceTimeout (1.0s)

Signal 3: Hard timeout
  → timeSinceLastChange > hardTimeout (1.5s)

Signal 4: Order-ending phrases
  → "please", "thanks", "that's all", "that's it"

Processing countdown:
  → If has content and silence started: show countdown (hardTimeout - elapsed)
  → Updates processingCountdown for UI display

4.5.4 Configuration (Verified Defaults)

swift

silenceTimeout: 1.0s          // Silence → process
requiredStabilityCount: 1     // Checks needed
transcriptStabilityInterval: 0.3s  // Between checks
hardTimeout: 1.5s             // Force-process regardless

### 4.6 Transcript Stability Tracker
File: `BWBCore/Voice/Detection/TranscriptStabilityTracker.swift` (125 LOC)

4.6.1 Algorithm (Verified)

updateTranscript(text):
  if text != currentTranscript:
    currentTranscript = text
    stabilityCount = 0
    lastChangeTime = now
    return true  // changed

checkStability():
  if now - lastCheckTime > interval:
    if currentTranscript == lastCheckedTranscript:
      stabilityCount += 1
    lastCheckedTranscript = currentTranscript
    lastCheckTime = now

isStable:
  stabilityCount >= requiredCount

### 4.7 Live Order Preview Generation
File: `BWBCore/Voice/Detection/LiveOrderPreviewGenerator.swift` (310 LOC)

4.7.1 Streaming Preview Pipeline

Partial transcript → lightweight NLU (no AI, local only)
  → Extract: item name guess, size, temperature, modifiers
  → Compute: confidence score based on match quality
  → Return: BWBCore.OrderPreview or nil

Debounce: 150ms (VoiceOrderingConfig.shared.liveItemsDebounceInterval)

4.7.2 OrderPreview Mapping (Verified)

swift

// BWBCore.OrderPreview (internal) → displayed as VoiceParsedOrder via computed property:
public var liveItems: [VoiceParsedOrder] {
    livePreviews.map { preview in
        var order = VoiceParsedOrder(itemName: preview.itemName)
        order.quantity = preview.quantity
        order.confidence = preview.confidence
        if let size = preview.size {
            order.size = DrinkSize(rawValue: size.lowercased())
        }
        if let temp = preview.temperature {
            order.temperature = DrinkTemperature(rawValue: temp.lowercased())
        }
        order.syrups = preview.modifiers
        return order
    }
}

---

5. Layer 3: Natural Language Understanding

### 5.1 Order Parsing Pipeline
File: `BWBCore/Voice/Pipeline/OrderParsingPipeline.swift` (531 LOC)
Class: `OrderParsingPipeline` — `@MainActor`, `ObservableObject`

5.1.1 Parsing Strategies (Verified Enum + Implementations)

swift

public enum ParsingStrategy: String, Sendable {
    case aiFirst     // AI → NLU fallback (if AI confidence < 0.85)
    case nluFirst    // NLU → AI enhancement (if NLU confidence < 0.8)
    case hybrid      // Both parallel → OrderResultMerger (DEFAULT)
    case aiOnly      // Only AI parser
    case nluOnly     // Only NLU parser (offline capable)
    case fastest     // TaskGroup race, first non-empty wins
}

##### [ip] AI-First Strategy (`parseAIFirst`)

1. If aiParser.isAvailable:
   a. Run AI parser (with context if available)
   b. If result.confidence >= aiConfidenceThreshold (0.85): return AI result
   c. If AI found items but low confidence: run NLU, merge
2. Fallback: run NLU parser

##### [ip] NLU-First Strategy (`parseNLUFirst`)

1. Run NLU parser
2. If confidence >= 0.8: return NLU result
3. If uncertain and AI available: run AI, merge
4. Return NLU result

##### [ip] Hybrid Strategy (`parseHybrid`) — DEFAULT

1. Run AI parser (with context) — async
2. Run NLU parser — async
   (Currently sequential, not truly parallel — see gap)
3. Merge via OrderResultMerger

##### [ip] Fastest Strategy (`parseFastest`)

1. TaskGroup with both parsers
2. First non-empty result wins → cancelAll()
3. If both empty → return empty

5.1.2 Context Building (Verified)

swift

static func buildContext(
    cartItems: [VoiceParsedOrder] = [],
    sessionHistory: [String] = [],
    userPreferences: [String: String] = [:],
    lastClarification: (question: String, answer: String)? = nil,
    includeConstraints: Bool = true
) -> AITranscriptParser.OrderParsingContext

// constraintsSummary: String? from YAMLConstraintEngine.shared.generateConstraintSummary()

5.1.3 Parse Result (Verified)

swift

public struct OrderParseResult: Sendable {
    let transcript: String
    let intent: OrderParseIntent        // 11 intent types
    let items: [VoiceParsedOrder]       // Parsed orders
    let confidence: Double              // 0-1
    let source: OrderParseSource        // .ai / .nlu / .hybrid / .fallback
    let clarificationsNeeded: [ClarificationRequest]
    let warnings: [String]
    let processingTimeMs: Int
    let pickupName: String?
    let metadata: [String: String]
}

5.1.4 Parse Intent Taxonomy (Verified — 11 Intents)

swift

public enum OrderParseIntent: String, Sendable {
    case order          // Normal order
    case clearOrder     // "Start over", "clear everything"
    case readCart       // "What's in my order?"
    case help           // "What can I order?"
    case confirm        // "Yes", "correct"
    case decline        // "No", "cancel"
    case modify         // "Change the size"
    case remove         // "Remove the croissant"
    case checkout       // "That's it", "check out"
    case repeat_        // "Say again"
    case unknown        // Unclassified
}

### 5.2 Result Merging
File: `BWBCore/Voice/Pipeline/OrderResultMerger.swift` (390 LOC)

5.2.1 Merge Strategies (Verified)

swift

public enum MergeStrategy: Sendable {
    case preferAI          // DEFAULT — use AI when confidence ≥ 0.85
    case preferNLU         // Use NLU when confidence ≥ 0.80
    case consensusRequired // Both must agree
    case itemUnion         // Union of items from both
    case itemIntersection  // Only items both detected
}

5.2.2 Merge Logic (Verified Config)

swift

aiConfidenceThreshold: 0.85    // Prefer AI above this
nluConfidenceThreshold: 0.80   // Prefer NLU above this
consensusBoost: 0.1            // Boost when both agree
defaultStrategy: .preferAI

#### 5.2.3 Consensus Detection
When both AI and NLU detect the same item:
- Match by item name similarity
- Merge slot values (AI takes precedence for ambiguous slots)
- Boost confidence by `consensusBoost` (0.1)

### 5.3 Intent Classification
File: `BWBCore/Voice/Parsing/IntentClassifier.swift` (186 LOC)
Struct: `IntentClassifier` — singleton

5.3.1 Pattern Matching (Verified — All Patterns)

##### [ip] Order Patterns

"i'll have", "i want", "can i get", "give me", "i'd like",
"let me get", "i'll take", "order", "make me", "please get me"

##### [ip] Confirm Patterns

"yes", "yeah", "correct", "that's right", "sounds good",
"perfect", "confirmed", "confirm", "yep", "right"

##### [ip] Cancel Patterns

"cancel", "never mind", "forget it", "stop", "no thanks",
"don't want", "changed my mind"

##### [ip] Modify Patterns

"change", "modify", "instead", "actually", "make it",
"switch", "different"

##### [ip] Remove Patterns (17 patterns — most extensive)

"remove the", "remove my", "remove a", "remove",
"take off the", "take off", "take away",
"delete the", "delete", "get rid of", "drop the",
"i don't want the", "don't want the", "don't want",
"scratch the", "scratch that",
"cancel the", "never mind the"

##### [ip] Checkout Patterns

"checkout", "check out", "pay", "that's all", "done ordering",
"finished", "ready to pay", "complete", "i'm done", "im done", "done"

##### [ip] Help Patterns (24 patterns — extensive)

"help", "help me", "what do you have", "menu", "options", "recommend",
"what can i order", "what can i get", "what's available", "whats available",
"what drinks do you have", "what drinks", "show me", "tell me what",
"what's on the menu", "what do you sell", "what do you serve",
"popular", "best seller", "top seller", "most popular",
"suggestions", "suggest", "any recommendations", "i don't know what",
"not sure what", "what should i get", "what should i order"

##### [ip] Repeat Patterns

"repeat", "say again", "what was that", "pardon", "sorry"

##### [ip] Clear Order Patterns

"clear my order", "clear the order", "clear everything",
"start over", "start fresh", "reset",
"cancel everything", "cancel my order", "never mind", "forget it",
"empty the cart", "remove everything"

##### [ip] Cart Inquiry Patterns

"what's in my cart", "what did i order", "read my order back",
"what do i have", "show my order", "repeat my order", "tell me my order",
"what have i got", "read back my order", "what's my order", "whats my order"

5.3.2 Classification Priority (Verified Order)

1. Clear order (meta)    → (.cancel, 1.0)
2. Cart inquiry (meta)   → (.help, config.helpIntentConfidence)
3. Cancel                → (.cancel, config.cancelIntentConfidence)
4. Confirm               → (.confirm, config.confirmIntentConfidence)
5. Checkout              → (.checkout, config.checkoutIntentConfidence)
6. Remove                → (.remove, config.removeIntentConfidence)
7. Modify                → (.modify, config.modifyIntentConfidence)
8. Help                  → (.help, config.helpIntentConfidence)
9. Repeat                → (.repeat_, config.repeatIntentConfidence)
10. Order (explicit)     → (.order, config.orderIntentExplicitConfidence)
11. Order (implicit)     → (.order, config.orderIntentImplicitConfidence) [if menu matches]
12. Unknown              → (.unknown, config.unknownIntentConfidence)

### 5.4 Entity Extraction
File: `BWBCore/Voice/Parsing/EntityExtractor.swift` (314 LOC)
Struct: `EntityExtractor` — singleton

5.4.1 Slot Extraction Methods (All Verified)

##### [ip] Size Extraction

swift

let sizeAliases: [String: DrinkSize] = [
    // Small aliases (8oz)
    "small", "short", "8oz", "8 oz", "tall", "little" → .small
    // Medium aliases (12oz)
    "medium", "regular", "12oz", "12 oz", "grande", "normal" → .medium
    // Large aliases (16oz)
    "large", "big", "16oz", "16 oz", "venti", "20oz", "20 oz",
    "extra large" → .large
]

##### [ip] Temperature Extraction

swift

let temperatureAliases: [String: DrinkTemperature] = [
    "hot", "warm", "heated", "steaming" → .hot
    "iced", "ice", "cold", "chilled", "frozen", "on ice" → .iced
]

Note: `.blended` defined in `DrinkTemperature` enum (with aliases "blended", "frozen", "frappe", "frappuccino", "smoothie") but NOT in EntityExtractor's aliases. Gap: "blended" won't be extracted.

##### [ip] Milk Extraction

swift

let milkAliases: [String: MilkType] = [
    "whole", "regular milk", "full fat" → .whole
    "skim", "nonfat", "non fat", "skinny", "fat free" → .skim
    "oat", "oat milk", "oatmilk", "oatly" → .oat
    "almond", "almond milk", "almondmilk" → .almond
    "soy", "soy milk", "soymilk" → .soy
    "coconut", "coconut milk", "coco" → .coconut
    "2%", "2 percent", "two percent" → .twoPercent
    "lactose free", "lactose-free" → .lactoseFree
]

Enhancement: Also supports "with X milk" regex pattern.

##### [ip] Caffeine Extraction

swift

let caffeineAliases: [String: CaffeineOption] = [
    "regular", "normal", "caffeinated" → .regular
    "decaf", "decaffeinated", "no caffeine" → .decaf
    "half caf", "half-caf", "halfcaf", "half caff", "split shot" → .halfCaf
]

##### [ip] Syrup Extraction

swift

let syrupKeywords = [
    "vanilla", "caramel", "hazelnut", "mocha", "chocolate",
    "lavender", "honey", "maple", "pumpkin spice", "pumpkin",
    "cinnamon", "peppermint", "mint", "raspberry", "almond"
]

##### [ip] Shots Extraction

swift

"triple shot" / "3 shots" → 3
"double shot" / "2 shots" / "extra shot" → 2
"single shot" / "1 shot" → 1
"quad" / "4 shots" → 4

##### [ip] Extras Extraction

swift

"whipped cream" / "whip" → "Whipped Cream"
"extra foam" → "Extra Foam"
"light foam" → "Light Foam"
"no foam" → "No Foam"
"light ice" → "Light Ice"
"no ice" → "No Ice"
"extra ice" → "Extra Ice"

##### [ip] Quantity Extraction

swift

Number words: one/a/an→1, two/couple/pair→2, three→3, four→4,
              five→5, six→6, seven→7, eight→8, nine→9, ten→10
Digits: regex \\b(\\d+)\\b, capped at config.maxQuantity
Default: 1

5.4.2 Full Slot Extraction (Verified)

swift

func extractAllSlots(from text: String, itemIndex: Int)
    -> (order: VoiceParsedOrder, predictions: [String: SlotPrediction])

Returns a `VoiceParsedOrder` with ALL slots filled + `SlotPrediction` objects for confidence tracking.

### 5.5 Voice NLU Engine
File: `BWBCore/Voice/VoiceNLUEngine.swift` (1,321 LOC)

5.5.1 Processing Pipeline

process(transcript:) → NLUResult
  │
  ├── 1. Normalize transcript
  ├── 2. IntentClassifier.classify(text, hasMenuMatches)
  ├── 3. MenuAliasMatcher.findMatches(text)
  ├── 4. EntityExtractor.extractAllSlots(text)
  ├── 5. QuantityExtractor.extract(text)
  ├── 6. ModifierDetector.detect(text)
  ├── 7. Assemble VoiceParsedOrder[]
  ├── 8. ConfidenceScorer.score(items, intent)
  └── 9. Return NLUResult { transcript, intent, confidence, parsedOrders, slotPredictions }

### 5.6 Enhanced NLU Engine
File: `BWBCore/Voice/EnhancedVoiceNLUEngine.swift` (745 LOC)

#### 5.6.1 Enhancements Over Base NLU
- Multi-item extraction (split on "and", "also", "plus")
- Relative modifiers ("make it bigger" → upsize)
- Better scoring with `ConfidenceScorer`
- Slot prediction with alternatives

### 5.7 AI Transcript Parser
File: `BWBCore/Voice/AITranscriptParser.swift` (1,343 LOC)

5.7.1 Provider Abstraction

swift

AIOrderParser (130 LOC) → wraps AITranscriptParser
  ├── OpenAI prompts: OpenAIPrompts.swift (209 LOC)
  └── Gemini prompts: GeminiPrompts.swift (162 LOC)

5.7.2 AI Parse Intent (Separate from OrderParseIntent)

swift

public enum AIParseIntent: String, Codable, Sendable {
    case order, clearOrder, readCart, help, confirm, decline, modify
}

5.7.3 Context Injection (Verified)

swift

struct OrderParsingContext {
    cartItems: [VoiceParsedOrder]
    sessionHistory: [String]
    constraintsSummary: String?  // From YAMLConstraintEngine
    userPreferences: [String: String]
    lastClarification: (question: String, answer: String)?
}

### 5.8 Slot Classification System
Files: `BWBCore/Voice/Slots/` (3 files, ~886 LOC total)

5.8.1 Slot Types (Verified Enum)

swift

public enum SlotType: String, Codable, Sendable, CaseIterable {
    case size, temperature, milk, caffeine, shots, syrup, quantity, menuItem

    var isHighImportance: Bool {
        self == .milk || self == .caffeine  // Allergen/health
    }

    var confidenceThreshold: Double {
        switch self {
        case .milk, .caffeine: return 0.7
        case .menuItem: return 0.6
        default: return 0.5
        }
    }
}

5.8.2 Slot Definition Structure

swift

struct SlotDefinition {
    let type: SlotType
    let classes: [SlotClass]       // Possible values with keywords
    let defaultValue: String?
    let isRequired: Bool
    let isHighImportance: Bool
}

struct SlotClass {
    let value: String              // "small"
    let keywords: [String]         // ["small", "short", "tall"]
    let confidence: Double         // Base confidence for this class
}

5.8.3 Enhanced Slot Prediction

swift

struct EnhancedSlotPrediction {
    let slotType: SlotType
    let predictedValue: String
    let displayValue: String
    let confidence: Double
    let gapToSecond: Double        // Confidence gap to #2 candidate
    let isExplicit: Bool           // Was explicitly mentioned
    let alternatives: [Alternative]
}

### 5.9 Constraint Engine
File: `BWBCore/Voice/ConstraintEngine.swift` (776 LOC)

5.9.1 Constraint Validation Pipeline

Parsed order → ConstraintEngine.validate(order, against: constraints)
  │
  ├── Item exists in menu?
  ├── Size available for this item?
  ├── Temperature valid? (e.g., no hot cold brew)
  ├── Milk compatible?
  ├── Modifier compatible?
  ├── Quantity within limits? (capped at 10)
  │
  ├── ALL PASS → continue to cart
  ├── SOFT VIOLATION → auto-correct + inform
  └── HARD VIOLATION → clarification needed

### 5.10 YAML Constraint Engine
File: `BWBCore/Voice/Constraints/YAMLConstraintEngine.swift` (746 LOC)

#### 5.10.1 Purpose
- Data-driven constraint definitions (no code changes for menu updates)
- `generateConstraintSummary()` → injected into AI parse context
- Hot-reloadable constraints

### 5.11 Constraint Types
File: `BWBCore/Voice/Constraints/ConstraintTypes.swift` (623 LOC)

5.11.1 Violation Types

swift

// Soft violations → auto-correct
// Hard violations → block + clarify
// Warnings → proceed but inform

### 5.12 Menu Matching & Embeddings
Directory: `BWBCore/Voice/Embeddings/` (4 files, ~1,400 LOC)

#### 5.12.1 Menu Alias Matching
File: `BWBCore/Voice/Detection/MenuAliasMatcher.swift` (256 LOC)

"flat white" → Flat White (exact match)
"flat wite"  → Flat White (fuzzy, Levenshtein ≤ 2)
"cortado"    → Cortado (exact)

#### 5.12.2 Text Embeddings
File: `BWBCore/Voice/Embeddings/TextEmbedder.swift` (357 LOC)

Text → Embedding vector
Used for semantic similarity when exact/fuzzy matching fails

#### 5.12.3 Vector Index
File: `BWBCore/Voice/Embeddings/VectorIndex.swift` (297 LOC)

Query embedding → Cosine similarity search → Top-K menu matches

#### 5.12.4 Menu Document
File: `BWBCore/Voice/Embeddings/MenuDocument.swift` (365 LOC)

swift

struct MenuDocument {
    let itemName: String
    let description: String
    let category: String
    let tags: [String]
    // + embedding vector
}

struct MenuSearchResult {
    let document: MenuDocument
    let confidence: Double
}

### 5.13 Modifier & Quantity Detection
File: `BWBCore/Voice/Detection/ModifierDetector.swift` (261 LOC)
File: `BWBCore/Voice/Detection/QuantityExtractor.swift` (123 LOC)

5.13.1 Modifier Categories

Addition: "with...", "add...", "extra..."
Removal: "without...", "no...", "hold the..."
Substitution: "instead of...", "swap..."

5.13.2 Quantity Safety (Verified in Orchestrator)

swift

// VoiceOrderingOrchestrator.handleParseResult():
if validated.quantity > 10 {
    Logger.voice.warning("⚠️ Suspicious quantity: \(validated.quantity). Capping at 10.")
    validated.quantity = 10
}
if validated.quantity < 1 {
    validated.quantity = 1
}

### 5.14 Confidence Scoring
File: `BWBCore/Voice/Parsing/ConfidenceScorer.swift` (230 LOC)

5.14.1 Scoring Factors

Base confidence from intent classification
+ Slot fill rate bonus (more slots filled → higher)
+ Menu match quality bonus
+ Consensus bonus (AI + NLU agree)
- Ambiguity penalty (close alternatives)
- Correction penalty (self-correction detected)

---

6. Layer 4: Dialogue Engine

### 6.1 Voice Dialogue Manager
File: `BWBCore/Voice/VoiceDialogueManager.swift` (572 LOC)
Class: `VoiceDialogueManager` — `@MainActor`, singleton, `ObservableObject`

6.1.1 State Machine (Verified)

swift

public enum VoiceDialogueState: String, Codable, Sendable {
    case idle, listening, processing, clarifying, confirming, complete, error
}

6.1.2 Dependencies

swift

private let nluEngine: VoiceNLUEngine
private let constraintEngine: ConstraintEngine

6.1.3 Intent Handlers (Verified)

swift

processTranscript(_:) → VoiceDialogueState
  switch result.intent:
    case .order    → handleOrderIntent(result)
    case .confirm  → handleConfirmIntent(previousState:)
    case .cancel   → handleCancelIntent(previousState:)
    case .modify   → handleModifyIntent(result, previousState:)
    case .remove   → handleRemoveIntent(result)
    case .checkout → handleCheckoutIntent()
    case .help     → handleHelpIntent()
    case .repeat_  → handleRepeatIntent()
    case .unknown  → handleUnknownIntent(result)

Note: The `VoiceDialogueManager` is a standalone component in BWBCore that can manage its own cart. In the Kiosk app, the `VoiceOrderingOrchestrator` handles dialogue flow directly using its own components (CartCoordinator, ConfirmationCoordinator) rather than delegating to VoiceDialogueManager. This creates a partial duplication — see Gap Analysis.

### 6.2 Clarification Policy
File: `BWBCore/Voice/Dialogue/ClarificationPolicy.swift` (336 LOC)
Class: `ClarificationPolicy` — `@unchecked Sendable`

6.2.1 Configuration Presets (Verified)

swift

// Default
confidenceThreshold: 0.6
gapThreshold: 0.1        // Gap between #1 and #2 candidate
highImportanceSlots: [.milk, .caffeine]
highImportanceThreshold: 0.75
maxClarificationsPerTurn: 2
clarifyMenuItem: true

// Strict
confidenceThreshold: 0.7, gapThreshold: 0.15, highImportanceThreshold: 0.85, max: 3

// Lenient
confidenceThreshold: 0.45, gapThreshold: 0.05, highImportanceThreshold: 0.6, max: 1

6.2.2 Clarification Decision Logic (Verified)

swift

shouldClarify(_ prediction: EnhancedSlotPrediction) -> Bool:
  let threshold = isHighImportance ? highImportanceThreshold : confidenceThreshold

  Rule 1: prediction.confidence < threshold → YES
  Rule 2: prediction.gapToSecond < gapThreshold → YES (too ambiguous)
  Rule 3: isHighImportance && isExplicit && confidence < 0.9 → YES

  Otherwise: NO

6.2.3 Question Generation (Verified Strategies)

HIGH IMPORTANCE (allergen):
  Milk: "What type of milk would you like? This is important for allergen information."
  Caffeine: "Would you like regular, decaf, or half-caf? Just want to make sure."

AMBIGUOUS (gap < threshold):
  "Did you want [option1] or [option2]?"

LOW CONFIDENCE:
  "I heard [value]. Is that right?"

DEFAULT:
  "What [slotName] would you like?"

6.2.4 Clarification Context Tracking (Verified)

swift

struct ClarificationContext: Sendable {
    var askedClarifications: [String: ClarificationRequest]
    var receivedResponses: [String: String]
    var turnCount: Int

    func alreadyAsked(_ slotType: SlotType) -> Bool
    var pendingClarifications: [ClarificationRequest]
}

6.2.5 Response Processing (Verified)

swift

processResponse(_ response:, for clarification:, slotDefinitions:)
    -> (value: String, confidence: Double)?

1. Direct match: response == option → (option, 0.95)
2. Partial match: contains → (option, 0.8)
3. Slot keyword match: → (value, 0.75)
4. No match: → nil

### 6.3 Confirmation Generation
File: `BWBCore/Voice/Dialogue/ConfirmationGenerator.swift` (334 LOC)

6.3.1 Confirmation Styles

Single item, high confidence:
  "Got it — one large iced latte with vanilla. Anything else?"

Multiple items:
  "So that's two large lattes and a blueberry muffin. Sound right?"

After correction:
  "Changed to medium. One medium iced latte with vanilla. Good?"

Cart summary (for readCart intent):
  "You have: one large iced oat latte with vanilla, and one chocolate croissant."

### 6.4 Confirmation Coordinator
File: `BWBCore/Voice/Coordination/ConfirmationCoordinator.swift` (626 LOC)
Class: `ConfirmationCoordinator` — `@MainActor`, `ObservableObject`

6.4.1 Confirmation Response Types (Verified)

swift

enum ConfirmationResponse {
    case confirmed                    // "Yes", "correct"
    case rejected                     // "No", "cancel"
    case modified(String)             // "Change the size..."
    case additionalOrder(String)      // "And also a croissant"
    case unclear                      // Can't classify
    case ignored                      // No response (timeout)
}

6.4.2 Auto-Confirm Logic (Verified)

swift

// Configuration:
autoConfirmMinConfidence: 0.85     // Minimum confidence to auto-confirm
autoConfirmCountdownDuration: 3.0  // Seconds of visual countdown
autoConfirmEnabled: false          // DISABLED by default

// Trigger conditions (all must be true):
//   autoConfirmEnabled == true
//   confidence >= minConfidence
//   no pending clarification
//   no speech detected for countdown duration

6.4.3 Delegate Pattern (Verified)

swift

public protocol ConfirmationCoordinatorDelegate: AnyObject {
    func confirmationDidAccept(_ orders: [VoiceParsedOrder])
    func confirmationDidReject(_ orders: [VoiceParsedOrder])
    func confirmationDidRequestModification(_ reason: String)
    func confirmationCountdownDidUpdate(_ remaining: TimeInterval)
    func confirmationDidAutoConfirm(_ orders: [VoiceParsedOrder])
}

### 6.5 Context-Aware Recovery
File: `BWBCore/Voice/Coordination/ContextAwareRecoveryService.swift` (476 LOC)
Class: `ContextAwareRecoveryService` — singleton

6.5.1 Recovery Types (Verified — 11 Types)

swift

public enum RecoveryType: String, Sendable {
    case unknownItem           // Can't identify item
    case ambiguousItem         // Multiple interpretations
    case invalidCombination    // Bad item+modifier combo
    case unavailableItem       // Out of stock/seasonal
    case emptyTranscript       // Nothing recognized
    case lowConfidence         // Parsed but very uncertain
    case generalHelp           // User asked for help
    case menuHelp              // Wants to see menu
    case sizeHelp              // Help with sizes
    case milkHelp              // Help with milk options
    case customizationHelp     // Help with customizations
}

6.5.2 Recovery Response (Verified)

swift

public struct RecoveryResponse: Sendable {
    let message: String
    let suggestions: [String]
    let shouldContinueListening: Bool
    let isFatal: Bool
    let recoveryType: RecoveryType
}

6.5.3 Escalation Model (Verified in Orchestrator)

consecutiveFailures: 0 → Normal parsing
consecutiveFailures: 1 → "I didn't catch that. Could you repeat?"
consecutiveFailures: 2 → "Try saying something like 'a medium iced latte'"
consecutiveFailures: 3+ → "Let me show you the menu. You can tap items to order."

### 6.6 Session Management
File: `BWBCore/Voice/Coordination/SessionManager.swift` (270 LOC)

6.6.1 Session Lifecycle (Verified)

swift

startSession() → UUID
  → isSessionActive = true
  → Start timeout timer (120 seconds)
  → Delegate: sessionDidStart(sessionId:)

endSession()
  → isSessionActive = false
  → Cancel timeout timer
  → Delegate: sessionDidEnd(sessionId:, duration:)

// Timeout:
sessionTimeoutInterval: 120.0  // 2 minutes of inactivity
  → Delegate: sessionDidTimeout(sessionId:)
  → Orchestrator calls endSession()

6.6.2 Session Tracking in Orchestrator (Verified)

swift

// lastSessionId: UUID? — tracks current session
// isNewSession check prevents cart clearing on resume:
let isNewSession = lastSessionId != sessionId
if isNewSession {
    cartCoordinator.clearAll()  // Fresh start
} else {
    // Resuming — preserve cart
}

---

7. Layer 5: Order State Machine

### 7.1 Cart Coordinator
File: `BWBCore/Voice/Coordination/CartCoordinator.swift` (452 LOC)
Class: `CartCoordinator` — `@MainActor`, `ObservableObject`

7.1.1 Dual-Track Cart Model (Verified)

swift

@Published confirmedOrders: [VoiceParsedOrder]   // Accepted items
@Published pendingOrders: [VoiceParsedOrder]      // Awaiting confirmation
@Published pendingClarification: ClarificationRequest?

7.1.2 Operations (Verified)

swift

addToPending([items])           // New parsed items → pending
confirmPending()                // Pending → confirmed
clearPending()                  // Discard pending (rejected)
clearAll()                      // Empty everything
removeItems(at: IndexSet)       // Remove by index
getAllOrders() → [VoiceParsedOrder]  // All confirmed
exportToOrderItems() → [OrderItem]  // For checkout

7.1.3 Delegate (Verified)

swift

protocol CartCoordinatorDelegate: AnyObject {
    func cartDidAddPendingItems(_ items: [VoiceParsedOrder])
    func cartDidConfirmItems(_ items: [VoiceParsedOrder])
    func cartDidRejectItems(_ items: [VoiceParsedOrder])
    func cartDidClear()
    func cartDidEncounterViolations(_ violations: [String])
}

7.2 VoiceParsedOrder (Verified — Complete Model)

swift

public struct VoiceParsedOrder: Codable, Sendable {
    let itemId: String?
    let itemName: String
    var size: DrinkSize?            // .small/.medium/.large/.extraLarge
    var temperature: DrinkTemperature?  // .hot/.iced/.blended
    var milk: MilkType?             // 9 types including .none
    var caffeine: CaffeineOption?   // .regular/.decaf/.halfCaf
    var syrups: [String]            // ["vanilla", "caramel"]
    var shots: Int?                 // 1-4
    var modifiers: [String]         // ["No Foam", "Extra Ice"]
    var quantity: Int               // 1-10 (capped)
    var confidence: Double          // 0.0-1.0
    var metadata: [String: String]  // Extensible
}

7.3 Drink Attribute Enums (Verified — All Aliases)

7.3.1 DrinkSize (with dual alias systems)

DrinkSize:
  .small (8oz) — aliases: small, s, 8, 8oz, eight, short, tall
  .medium (12oz) — aliases: medium, m, 12, 12oz, twelve, regular, grande
  .large (16oz) — aliases: large, l, 16, 16oz, sixteen, big, venti
  .extraLarge (20oz) — aliases: extra large, xl, 20, 20oz, twenty, trenta

NOTE: Two alias systems exist:
  1. DrinkSize.fromAlias(_:) — static method on enum (comprehensive)
  2. EntityExtractor.sizeAliases — dictionary (subset, maps "extra large"→.large NOT .extraLarge)
  GAP: Inconsistency — EntityExtractor maps "venti" and "extra large" to .large, not .extraLarge

7.3.2 DrinkTemperature

DrinkTemperature:
  .hot — aliases: hot, warm, heated, steaming
  .iced — aliases: iced, ice, cold, on ice, chilled, over ice
  .blended — aliases: blended, frozen, frappe, frappuccino, smoothie

GAP: EntityExtractor only maps hot/warm/heated/steaming and iced/ice/cold/chilled/frozen/on ice
     "blended", "frappe", "frappuccino", "smoothie" NOT in EntityExtractor — only in enum's fromAlias()

7.3.3 MilkType (9 types)

MilkType:
  .whole, .skim, .twoPercent, .oat, .almond, .soy, .coconut, .lactoseFree, .none

Each with comprehensive aliases in both enum.fromAlias() and EntityExtractor.milkAliases

7.3.4 CaffeineOption

CaffeineOption:
  .regular — aliases: regular, normal, caffeinated, full caffeine
  .decaf — aliases: decaf, decaffeinated, no caffeine, caffeine free
  .halfCaf — aliases: half caf, half-caf, halfcaf, half caffeine, split

7.4 Checkout Flow (Verified)

swift

// In Orchestrator:
func exportCartForCheckout() -> [OrderItem] {
    cartCoordinator.exportToOrderItems()
}

func finalizeOrder() -> [OrderItem] {
    let items = exportCartForCheckout()
    clearCart()
    endSession()
    return items
}

// In View:
.sheet(isPresented: $showingPayment) {
    KioskPaymentView(orderItems: voiceService.finalizeOrder()) { completedOrder in
        order = completedOrder
        dismiss()
    }
}

---

8. Layer 6: Synthesis & Feedback

### 8.1 Feedback Coordinator
File: `BWBCore/Voice/Coordination/FeedbackCoordinator.swift` (358 LOC)
Class: `FeedbackCoordinator` — `@MainActor`, `ObservableObject`

8.1.1 Audio Feedback (Verified — System Sound IDs)

swift

public enum VoiceAudioFeedback {
    case wakeWordDetected       // SystemSound 1057 (Tink)
    case itemAdded              // SystemSound 1104 (Pop)
    case orderConfirmed         // SystemSound 1025 (Success)
    case clarificationNeeded    // SystemSound 1315 (Alert)
    case clarificationReceived  // SystemSound 1104 (Pop)
    case helpProvided           // SystemSound 1114 (Informational)
    case error                  // SystemSound 1053 (Error)
    case errorRecovery          // SystemSound 1007 (Soft notification)
    case listening              // SystemSound 1306 (Click)
    case stopped                // SystemSound 1306 (Click)
}

8.1.2 Haptic Feedback (Verified)

swift

public enum VoiceHapticFeedback {
    case selection       // Haptics.selection()
    case itemAdded       // Haptics.light()
    case orderConfirmed  // Haptics.success()
    case error           // Haptics.error()
    case warning         // Haptics.warning()
    case wakeWord        // Haptics.success()
    case prompt          // Haptics.light()
}

8.1.3 Convenience Methods (Verified)

swift

playStartListening()      // medium haptic + listening sound
playStopListening()       // light haptic
playWakeWordDetected()    // sound 1057 + success haptic
playItemAdded()           // sound 1104 + light haptic
playOrderConfirmed()      // sound 1025 + success haptic
playClarificationNeeded() // sound 1315 + medium haptic
playClarificationReceived() // sound 1104 + medium haptic
playHelpProvided()        // sound 1114 + light haptic
playError()               // sound 1053 + error haptic
playErrorRecovery()       // sound 1007 + warning haptic

8.1.4 TTS Delegation (Verified)

swift

useAudioSessionManager: Bool = true  // DEFAULT: delegate to AudioSessionManager

speakAndWait(_ text:, thenListen:):
  if useAudioSessionManager:
    → AudioSessionManager.shared.playTTS(text:)  // Proper isolation
  else:
    → Local AVSpeechSynthesizer (legacy, no isolation)

### 8.2 Audio Session Manager TTS (Verified)
See Layer 1 §3.1 — handles full isolation lifecycle.

---

9. Layer 7: Interaction Surface

9.1 View Hierarchy (Verified)

KioskVoiceOrderingView (root) ← @StateObject VoiceOrderingOrchestrator.shared
├── Background: LinearGradient(vinyl._900 → vinyl._800)
│
├── KioskVoiceHeader
│   ├── Cart badge (cartItemCount)
│   ├── Settings button → showingSettings sheet
│   └── Close button → dismiss()
│
├── Main Content (@ViewBuilder, state-dependent)
│   ├── [idle && !isPushToTalkRecording]
│   │   └── KioskVoiceWelcomeView
│   │       ├── Push-to-talk button
│   │       ├── Wake word status indicator
│   │       └── "Or browse the menu" → showTouchOrdering
│   │
│   ├── [isPushToTalkRecording]
│   │   └── KioskVoiceWelcomeView (recording mode)
│   │
│   └── [active session]
│       └── KioskVoiceActiveOrderingView
│           ├── VoiceWaveformView (audio visualization)
│           ├── TranscriptDisplayView (streaming text + countdown)
│           ├── Live preview cards
│           └── ConfirmationOverlayView (when confirming)
│
├── KioskVoiceActionBar
│   ├── Voice mode toggle (continuous/push-to-talk/wake-word)
│   ├── Force process button
│   └── Checkout shortcut
│
├── Checkout prompt overlay (after 5s inactivity with items in cart)
│   ├── "Ready to checkout?"
│   ├── Checkout button → showingPayment
│   └── Continue ordering → dismiss overlay
│
└── Sheets
    ├── .sheet(showingCart) → KioskVoiceCartSheet
    ├── .sheet(showingPayment) → KioskPaymentView
    ├── .sheet(showingSettings) → KioskVoiceSettingsSheet
    └── .fullScreenCover(showTouchOrdering) → KioskTouchOrderingView

9.2 Input Modes (Verified)

swift

public enum VoiceInputMode: String, CaseIterable, Sendable {
    case continuous     // "Continuous" — auto-detect end of speech
    case pushToTalk     // "Push to Talk" — hold button, release to process
    case wakeWord       // "Wake Word" — "Hey Brews" activation
}

9.2.1 Push-to-Talk Lifecycle (Verified)

startPushToTalkRecording():
  → Request auth
  → Reset transcript/detection state
  → phase = .listening
  → startListening()
  → Play listening sound + selection haptic

stopPushToTalkAndProcess():
  → isPushToTalkRecording = false
  → Selection haptic
  → Get final displayTranscript
  → stopListening()
  → If empty: speak error prompt
  → If content: processUtterance(transcript)
  → If confirming: resume listening for response

cancelPushToTalkRecording():
  → stopListening()
  → Reset state
  → Warning haptic

9.3 Speech Phase Mapping (Verified)

swift

var speechPhase: SpeechPhase {
    if isSpeaking        → .confirming
    if isProcessing      → .processing
    if isSpeechDetected  → .speaking
    if phase == .confirming → .confirming
    if isListening:
        if processingCountdown != nil → .paused
        else → .waiting
    return .idle
}

9.4 Checkout Prompt Timing (Verified)

swift

// Triggers when cart count increases:
onChange(of: voiceService.cartItemCount) { old, new in
    if new > old && new > 0 {
        checkoutPromptDismissed = false
        Task {
            try? await Task.sleep(nanoseconds: 5_000_000_000)  // 5 seconds
            if voiceState == .listening && !cart.isEmpty && !isSpeechDetected {
                showCheckoutPrompt = true  // with spring animation
            }
        }
    }
}

// Dismissed by speech detection or confirming state

9.5 Help Message Generation (Verified)

swift

func generateHelpMessage() -> String {
    // Context-aware:
    if cartItems.isEmpty:
        "To order, just say what you'd like. For example: 'I'll have a medium latte'..."
    else:
        "Say 'repeat my order' to hear your current items..."
        "Say 'clear order' or 'start over' to remove all items..."
        "Say 'checkout' when you're ready to pay..."

    // Always:
    "You can say sizes like small, medium, or large"
    "You can say 'iced' or 'hot' for temperature"
    "Say 'yes' or 'no' to confirm items"
}

---

10. Layer 8: Learning & Telemetry

### 10.1 Pattern Learner
File: `BWBCore/Voice/Learning/PatternLearner.swift` (318 LOC)
Class: `PatternLearner` — `@MainActor`, singleton, `ObservableObject`

10.1.1 Learned Alias Storage

swift

struct LearnedAliasStore {
    var menuItems: [String: LearnedAlias]
    var sizes: [String: LearnedAlias]
    var temperatures: [String: LearnedAlias]
    var milks: [String: LearnedAlias]
    var syrups: [String: LearnedAlias]
}

struct LearnedAlias {
    let spokenPhrase: String  // What user said
    let mappedValue: String   // What it maps to
    let slotType: String      // Which slot category
}

10.1.2 Persistence

Storage: [home-path]
Load: on init
Save: after applying approved suggestions

10.1.3 Learning Loop

FeedbackCollector tracks corrections → generates suggestions
PatternLearner.applyApprovedSuggestions() → loads approved
  → Stores as LearnedAlias
  → Marks as applied in FeedbackCollector
  → Saves to disk

### 10.2 Feedback Collector
File: `BWBCore/Voice/Learning/FeedbackCollector.swift` (512 LOC)
Class: `FeedbackCollector` — `@MainActor`, singleton, `ObservableObject`

10.2.1 Feedback Tracking

Implicit signals:
  - Confirmation accepted → parsing was correct
  - Confirmation rejected → parsing error
  - Clarification resolved → slot was ambiguous
  - Session abandoned → overall failure

Stored corrections:
  - Original transcript
  - Original parse
  - Corrected value
  - Slot type
  → Generates suggested aliases for PatternLearner

### 10.3 Learning Types
File: `BWBCore/Voice/Learning/LearningTypes.swift` (179 LOC)

10.3.1 Type Definitions

swift

struct FeedbackEvent { ... }       // Single feedback instance
struct AliasSuggestion { ... }     // Suggested new alias
struct LearningStats { ... }       // Accuracy metrics

---

11. Cross-Cutting: Error Taxonomy & Recovery

11.1 Error Categories by Layer

Layer 1 Errors

MIC_UNAVAILABLE → Show permission alert (openSettingsURLString)
AUDIO_SESSION_INTERRUPTED → Pause gracefully, resume on restoration
AUDIO_SESSION_CONFIG_FAIL → Log error, continue with previous config
WAKE_WORD_AUTH_DENIED → Disable wake word, log warning

Layer 2 Errors

NO_SPEECH_DETECTED → 5s silence → promptForOrder()
RECOGNITION_ERROR → Log, reset recognizer, continue
RECOGNITION_UNAVAILABLE → Switch to NLU-only parsing
ASSET_DOWNLOAD_FAIL (iOS 26) → Fall back to LegacyVoiceService

Layer 3 Errors

LOW_CONFIDENCE_PARSE (< 0.6) → Clarification flow
AI_API_TIMEOUT (> 5s) → Use NLU result only
AI_API_ERROR → Fall back to NLU-only
UNKNOWN_ITEM → ContextAwareRecoveryService.generateRecovery(.unknownItem)
AMBIGUOUS_ITEM → generateRecovery(.ambiguousItem)
INVALID_COMBINATION → generateRecovery(.invalidCombination)
EMPTY_PARSE → Escalating recovery (failures 1→2→3+)

Layer 4 Errors

CLARIFICATION_TIMEOUT → Re-ask or proceed with default
CONFIRMATION_TIMEOUT → Auto-confirm (if enabled) or re-ask
REPEATED_FAILURES (3+) → Offer touch fallback

Layer 5 Errors

CONSTRAINT_VIOLATION_SOFT → Auto-correct + inform
CONSTRAINT_VIOLATION_HARD → Block + clarify
QUANTITY_OVERFLOW (>10) → Cap at 10, warn

Layer 6 Errors

TTS_FAILURE → Skip speech, show text only
AUDIO_ROUTING_ERROR → Reset audio session
VOICE_NOT_AVAILABLE → Use system default voice

---

12. Cross-Cutting: Performance & Latency Budget

12.1 Latency Breakdown

Component                     Current      Target
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Utterance detection           ≤1.5s        ≤1.0s
NLU parsing                   <100ms       <50ms
AI parsing                    1.0-3.0s     <1.5s
Result merging                <10ms        <10ms
Confirmation generation       <10ms        <10ms
TTS isolation overhead        ~500ms       0ms (with AEC)
TTS speech                    variable     variable
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total (end-of-speech → TTS):  2.0-5.0s     <2.0s

12.2 Accuracy Targets

Metric                        Current      Target
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Intent classification         ~90%         >95%
Entity extraction             ~85%         >92%
Overall order accuracy        ~85%         >92%
First-try success rate        ~80%         >88%
Session completion rate       Unknown      >90%

---

13. Cross-Cutting: Privacy, Security, Offline

13.1 Privacy

✓ Audio processed in real-time, not stored after session
✓ Transcripts cleared on endSession()
✓ No biometric voiceprints
✓ No PII extraction
✗ No opt-in analytics yet

13.2 Security

✓ API keys in xcconfig (Phase 1 completed 2026-01-19)
✓ fatalError() replaced with throws/Result (Phase 1 completed)
✓ HTTPS for all API calls
✓ No hardcoded secrets

13.3 Offline Capability

WORKS OFFLINE:
  ✓ NLU parsing (IntentClassifier + EntityExtractor + SlotClassifier)
  ✓ Wake word detection
  ✓ Audio feedback + haptics
  ✓ Cart management (all local state)
  ✓ All UI (SwiftUI, no network)
  ✓ Speech recognition (iOS 26+ on-device mode)
  ✓ Constraint validation
  ✓ Learned alias lookup

REQUIRES NETWORK:
  ✗ AI parsing (OpenAI/Gemini API calls)
  ✗ Speech recognition (LegacyVoiceService server mode)
  ✗ Menu embedding updates
  ✗ Order submission to backend

DEGRADED MODE:
  → ParsingStrategy falls to .nluOnly
  → Lower accuracy but functional
  → Queue orders locally (Phase 7 — not yet implemented)

---

14. State Machine Formal Specification

14.1 Orchestrator Phase State Machine (Verified)

States: {idle, listening, processing, confirming, clarifying, complete, error}

Transitions:
  idle → listening        [startSession(), resume successful]
  listening → processing  [utterance complete, transcript non-empty]
  listening → idle        [endSession()]
  listening → error       [startListening() failed]
  processing → confirming [items parsed, pending confirmation]
  processing → listening  [meta-command handled, or no items + recovery]
  processing → error      [unhandled exception]
  confirming → listening  [confirmed → items added, or rejected → items cleared]
  confirming → processing [additional order detected in confirmation response]
  clarifying → listening  [clarification resolved]
  clarifying → confirming [clarification leads to confirmed order]
  * → idle               [endSession() from any state]
  * → error              [unrecoverable failure]

14.2 Audio Session State Machine (Verified)

States: {idle, listening, speaking, transitioning}

Transitions:
  idle → listening        [configureAudioSession(.listening)]
  listening → transitioning [pauseListening()]
  transitioning → speaking  [configureAudioSession(.speaking)]
  speaking → transitioning  [TTS complete, resumeListening()]
  transitioning → listening  [configureAudioSession(.listening), voiceService.resume()]

14.3 Wake Word State Machine (Verified)

States: {disabled, enabled, listening, paused}

Transitions:
  disabled → enabled    [enable()]
  enabled → listening   [startListening()]
  listening → paused    [pause() — during active session]
  paused → listening    [resume() — after session ends]
  * → disabled          [disable()]

14.4 Confirmation State Machine (Verified)

States: {idle, awaiting, auto_confirming, resolved}

Transitions:
  idle → awaiting         [startConfirmation(items, transcript, confidence)]
  awaiting → resolved     [processResponse() → .confirmed/.rejected]
  awaiting → auto_confirming [autoConfirm conditions met, countdown starts]
  auto_confirming → resolved [countdown reaches 0]
  auto_confirming → awaiting [speech detected → cancelAutoConfirm]
  resolved → idle         [reset()]

---

15. End-to-End Flow Traces

15.1 Happy Path: Single Item Order

t=0.0s  Customer: "Hey Brews"
        → WakeWordDetector.onWakeWordDetected("Hey Brews")
        → SystemSound 1057 (Tink) + Haptics.success()
        → startSession() → UUID created
        → wakeWordDetector.pause()
        → transcriptPipeline.reset(), cartCoordinator.clearAll()
        → startListening() → SpeechAnalyzerService or LegacyVoiceService
        → phase = .listening

t=0.5s  System TTS: "Hi! What can I get you?" (if greeting enabled)
        → AudioSessionManager.pauseListening()
          → voiceService.suspend()
          → sleep(100ms)
          → configureAudioSession(.speaking)
        → AudioSessionManager.playTTS("Hi! What can I get you?")
          → AVSpeechSynthesizer.speak(utterance)
        → [TTS plays ~2s]
        → AudioSessionManager.resumeListening()
          → sleep(300ms)
          → configureAudioSession(.listening)
          → sleep(100ms)
          → voiceService.resume()

t=3.0s  Customer: "Can I get a large iced oat milk latte with vanilla"
        → voiceActivityResults: isSpeechDetected = true
          → transcriptPipeline.recordSpeechActivity()
          → utteranceDetector.recordSpeechActivity()
          → cancelSilencePromptTimer()
        → transcriptionResults (streaming):
          t+0.2s: "can I"
          t+0.5s: "can I get a"
          t+0.8s: "can I get a large"
          t+1.0s: "can I get a large iced"
          t+1.3s: "can I get a large iced oat milk latte"
          t+1.5s: "can I get a large iced oat milk latte with vanilla"

        Each partial → handleTranscriptionResult():
          → transcriptPipeline.processIncoming()
          → livePreviewGenerator.generatePreview()
            → At "large iced": preview = OrderPreview(itemName: "?", size: "large", temp: "iced")
            → At "oat milk latte": preview = OrderPreview(itemName: "Latte", size: "large", ...)
            → At "with vanilla": preview = OrderPreview(itemName: "Latte", syrups: ["vanilla"])
          → utteranceDetector.analyze()
            → NOT complete (speech still active)

t=4.5s  Customer stops speaking
        → voiceActivityResults: isSpeechDetected = false
        → utteranceDetector begins:
          → silence timer starts
          → stability checks begin (every 0.3s)

t=5.5s  utteranceDetector.analyze():
        → Silence: 1.0s > silenceTimeout (1.0s) ✓
        → Stability: stabilityCount >= 1 ✓
        → reason: .silenceTimeout, confidence: 1.0
        → isComplete = true

t=5.5s  processUtterance("can I get a large iced oat milk latte with vanilla")
        → addToSessionHistory()
        → phase = .processing, isProcessing = true

        parsingPipeline.parseWithAutoContext(transcript, cart: [], history: [...])
        → buildContext(cartItems: [], sessionHistory: [...], includeConstraints: true)

        HYBRID STRATEGY:
          AI Parser: send to OpenAI/Gemini with context
          NLU Parser:
            IntentClassifier: "can i get" matches orderPatterns → (.order, 0.9)
            EntityExtractor:
              size: "large" → .large (0.85)
              temperature: "iced" → .iced (0.85)
              milk: "oat milk" → .oat (0.85)
              syrups: ["vanilla"] (0.85)
              quantity: implicit → 1
            MenuAliasMatcher: "latte" → "Latte" (exact, 1.0)

          AI returns: { intent: "order", items: [{name: "Latte", size: "large", ...}], confidence: 0.95 }
          NLU returns: { intent: .order, items: [{name: "Latte", size: .large, ...}], confidence: 0.88 }

          OrderResultMerger.merge():
            → Both found same item → consensusBoost (+0.1)
            → AI confidence (0.95) > aiConfidenceThreshold (0.85) → prefer AI
            → Final: confidence = 0.95, source = .hybrid

t=6.0s  handleParseResult(result):
        → result.items = [VoiceParsedOrder(name: "Latte", size: .large, temp: .iced, milk: .oat, syrups: ["vanilla"], quantity: 1, confidence: 0.95)]
        → Validate quantities (1 ≤ 10 ✓)
        → cartCoordinator.addToPending(items)
        → confirmationCoordinator.startConfirmation(items, "can I get...", 0.95)
          → generates: "One large iced oat latte with vanilla — anything else?"
        → phase = .confirming

        speakWithIsolation("One large iced oat latte with vanilla — anything else?", thenListen: true)
        → pauseListening() → sleep(100ms) → configureAudioSession(.speaking)
        → playTTS(text) → ~3s speech
        → resumeListening() → sleep(300ms) → configureAudioSession(.listening) → voiceService.resume()

t=9.5s  Customer: "That's it"
        → handleTranscriptionResult → utterance detected
        → processUtterance("that's it")
        → confirmationCoordinator.isAwaitingConfirmation = true
        → handleConfirmationResponse("that's it")
          → "that's" not in confirmPatterns but in checkoutPatterns → handled as checkout

        OR if interpreted as confirm:
        → confirmationCoordinator.processResponse("that's it")
          → checks patterns → "that's" matches "that's all" (checkout)
          → returns .additionalOrder or handled in meta

t=10.0s handleMetaCommand(.checkout):
        → cartItems not empty ✓
        → shouldProceedToCheckout = true
        → speakWithIsolation("Taking you to checkout.", thenListen: false)

t=10.5s View: onChange(shouldProceedToCheckout):
        → showingPayment = true
        → KioskPaymentView(orderItems: voiceService.finalizeOrder())
          → exportCartForCheckout() → [OrderItem]
          → clearCart()
          → endSession()

15.2 Error Recovery Flow: Unknown Item

t=0s  Customer: "I'd like a mackiado"
      → Parsing:
        NLU: MenuAliasMatcher("mackiado") → fuzzy → "macchiato" (Levenshtein=2, score 0.72)
        AI: "macchiato" (confidence: 0.78)
        Merge: {name: "macchiato", confidence: 0.75}

      → confidence 0.75 ≥ 0.6 → proceed (but below 0.85 → won't auto-confirm)
      → confirmationCoordinator.startConfirmation(items, ..., 0.75)
      → "Did you mean a macchiato?"

t=2s  Customer: "Yes"
      → handleConfirmationResponse("yes")
      → .confirmed
      → cartCoordinator.confirmPending()
      → "Added to your order. Anything else?"

15.3 Error Recovery Flow: Total Parse Failure

t=0s  Customer: [unintelligible mumble]
      → Parsing: NLU → no items, AI → no items
      → consecutiveFailures = 1

      ContextAwareRecoveryService.generateRecovery(
        type: .unknownItem,
        context: RecoveryContext(failureCount: 1)
      )
      → "I didn't catch that. Could you repeat your order?"

t=3s  Customer: [still unclear]
      → consecutiveFailures = 2
      → "I'm having trouble understanding. Try saying something like 'a medium iced latte'"

t=6s  Customer: [still unclear]
      → consecutiveFailures = 3
      → "Let me show you the menu. You can also tap items to order."
      → [Touch ordering suggested]

---

16. Gap Analysis: Current vs Next-Gen

16.1 Verified Gaps

ID	Component	Current State	Issue	Severity
G1	Audio isolation	pause/resume cycle	500ms+ overhead per TTS, no simultaneous talk/listen	Critical
G2	Dialogue context	VoiceDialogueManager (572 LOC) exists but Orchestrator manages its own flow	Duplication, VDM not used by Kiosk	Medium
G3	Hybrid parsing	Sequential not truly parallel	`parseHybrid` awaits AI then NLU, not concurrent	Medium
G4	EntityExtractor aliases	Missing "blended", "frappe", "frappuccino", "smoothie"	Temperature `.blended` not extractable	Low
G5	Size alias inconsistency	EntityExtractor maps "extra large"→.large, enum has .extraLarge	xl orders misclassified	Low
G6	Analytics pipeline	No telemetry	Zero visibility into real-world accuracy	High
G7	Multi-language	English only	No French, N'Ko	Medium
G8	Branded TTS	AVSpeechSynthesizer	Generic voice, no personality	Medium
G9	Audio feedback fix	Phase 2 in roadmap — "not started"	But AudioSessionManager IS implemented	Stale doc
G10	Slot inference	No cross-item inference	"And a cappuccino" doesn't inherit previous item's size/milk	Medium
G11	User profiles	No cross-session memory	"The usual" not supported	Low priority
G12	Streaming AI parse	AI parses after utterance complete	Could parse incrementally as user speaks	High
G13	Sentiment detection	None	Can't detect frustration or satisfaction	Medium
G14	A/B testing	None	Can't compare NLU improvements	Medium
G15	Edge NLU model	Pattern matching only	No ML classifier on-device	High

### 16.2 Stale Roadmap Items (Verified)
- Phase 2 (Audio Feedback Fix): AudioSessionManager IS implemented with `speakWithIsolation()`. Phase may be partially complete.
- Phase 5, Plan 05-01 (Split KioskVoiceOrderingView): Already done — extracted to 6 component files in `Components/`
- Phase 8, Plan 08-02 (Audit @unchecked Sendable): `ClarificationPolicy` still marked `@unchecked Sendable`

---

17. Evolution Roadmap

17.1 Phase A: Audio Excellence (Critical Path)

A.1 Acoustic Echo Cancellation

Problem: 500ms+ overhead per TTS turn
Solution: AVAudioEngine with custom AEC tap
  → Run TTS + mic simultaneously
  → Subtract TTS waveform from mic input
  → Remove pause/resume cycle entirely
Impact: -500ms latency per turn, more natural conversation
Complexity: High (audio DSP, timing synchronization)

A.2 Noise-Robust VAD

Problem: Basic energy threshold in noisy coffee shop
Solution: Spectral analysis VAD + speaker diarization
  → CoreML on-device VAD model
  → Trained on coffee shop noise profiles (grinder, steamer, music)
  → Focus on nearest speaker
Impact: Better utterance detection in real environments
Complexity: Medium

A.3 Branded TTS Voice

Problem: Generic AVSpeechSynthesizer
Solution: ElevenLabs custom voice
  → Train on barista-style speech
  → Warm, friendly, slightly upbeat persona
  → Fallback chain: ElevenLabs → Apple Neural → AVSpeech
Impact: Distinctive brand identity, more engaging
Complexity: Low (API integration)

17.2 Phase B: Intelligence Upgrade

B.1 Fix Hybrid Parsing to True Parallel

Problem: parseHybrid() is sequential
Solution: Use TaskGroup for true concurrent execution
Impact: Faster response when both AI and NLU available
Complexity: Low (restructure async calls)

B.2 Streaming AI Parse

Problem: AI parses only after utterance complete
Solution: Send partial transcripts to AI with streaming response
  → Token-level intent detection
  → Progressive slot filling
  → Cancel/correct mid-stream
Impact: Sub-second apparent response time
Complexity: High

B.3 Multi-Turn Context Engine

Problem: No anaphora resolution, no implicit inference
Solution: Dialogue state tracker with frame semantics
  → "Make it bigger" → upsize current
  → "Same thing" → repeat last order
  → "And a cappuccino" → infer preferences from previous item
Impact: Natural multi-turn conversations
Complexity: High

B.4 Edge NLU Model

Problem: Pattern matching has ceiling
Solution: Fine-tuned DistilBERT → CoreML
  → Train on coffee ordering data
  → Sub-100ms inference
  → Intent + entity joint model
Impact: Major accuracy improvement offline
Complexity: High (data collection + training)

17.3 Phase C: Personalization

C.1 User Profiles

"The usual" → SwiftData profile → one-tap reorder
Dietary prefs → auto-filter incompatible items
Session learning persists → cross-visit intelligence

C.2 Smart Upselling

Context-aware: "Want a pastry?" after drink order
Time-aware: Breakfast combos before 11am
Weather-aware: Iced drinks on hot days
Never pushy, always value-adding

C.3 Multi-Language

French: Parisian coffee vocabulary
N'Ko: Manding language family
Auto-detect from first utterance
Code-switching: "Un latte grande please"

17.4 Phase D: Observability

D.1 Pipeline Telemetry

Every utterance logged:
  → Transcript, intent, entities, confidence, latency
  → Parse source (AI/NLU/hybrid)
  → Confirmation response
  → Recovery events
Dashboard: accuracy by item, slot, intent over time

D.2 Active Learning

Flag low-confidence utterances → human review queue
Corrections → PatternLearner integration
A/B test NLU improvements

---

18. Codebase Map (Verified — All Files with Accurate LOC)

18.1 BWBCore/Sources/BWBCore/Voice/ (19,703 LOC total)

Voice/
├── AITranscriptParser.swift              1,343 LOC  — AI parsing + context
├── AudioSessionManager.swift               373 LOC  — Audio routing + TTS
├── ConstraintEngine.swift                  776 LOC  — Order validation
├── EnhancedVoiceNLUEngine.swift            745 LOC  — Advanced NLU
├── LegacyVoiceService.swift                373 LOC  — SFSpeechRecognizer
├── SpeechAnalyzerService.swift             575 LOC  — iOS 26+ speech
├── VoiceDialogueManager.swift              572 LOC  — Dialogue state (unused by Kiosk)
├── VoiceNLUEngine.swift                  1,321 LOC  — Core NLU
├── VoiceServiceProtocol.swift              300 LOC  — Service abstraction
├── VoiceTypes.swift                        447 LOC  — Shared types
├── WakeWordDetector.swift                  540 LOC  — Wake word
│
├── Constraints/
│   ├── ConstraintParser.swift              253 LOC
│   ├── ConstraintTypes.swift               623 LOC
│   └── YAMLConstraintEngine.swift          746 LOC
│
├── Coordination/
│   ├── CartCoordinator.swift               452 LOC
│   ├── ConfirmationCoordinator.swift       626 LOC
│   ├── ContextAwareRecoveryService.swift   476 LOC
│   ├── FeedbackCoordinator.swift           358 LOC
│   └── SessionManager.swift                270 LOC
│
├── Detection/
│   ├── LiveOrderPreviewGenerator.swift     310 LOC
│   ├── MenuAliasMatcher.swift              256 LOC
│   ├── ModifierDetector.swift              261 LOC
│   ├── QuantityExtractor.swift             123 LOC
│   ├── TranscriptStabilityTracker.swift    125 LOC
│   └── UtteranceCompletionDetector.swift   344 LOC
│
├── Dialogue/
│   ├── ClarificationPolicy.swift           336 LOC
│   └── ConfirmationGenerator.swift         334 LOC
│
├── Embeddings/
│   ├── MenuDocument.swift                  365 LOC
│   ├── MenuEmbeddingService.swift          387 LOC
│   ├── TextEmbedder.swift                  357 LOC
│   └── VectorIndex.swift                   297 LOC
│
├── Learning/
│   ├── FeedbackCollector.swift             512 LOC
│   ├── LearningTypes.swift                 179 LOC
│   └── PatternLearner.swift                318 LOC
│
├── Parsing/
│   ├── ConfidenceScorer.swift              230 LOC
│   ├── EntityExtractor.swift               314 LOC
│   ├── IntentClassifier.swift              186 LOC
│   ├── TranscriptPreprocessor.swift        226 LOC
│   └── Prompts/
│       ├── GeminiPrompts.swift             162 LOC
│       └── OpenAIPrompts.swift             209 LOC
│
├── Pipeline/
│   ├── AIOrderParser.swift                 130 LOC
│   ├── NLUOrderParser.swift                 90 LOC
│   ├── OrderParsingPipeline.swift          531 LOC
│   ├── OrderResultMerger.swift             390 LOC
│   ├── TranscriptNormalizer.swift          228 LOC
│   ├── TranscriptPipeline.swift            188 LOC
│   └── TranscriptState.swift               136 LOC
│
└── Slots/
    ├── SlotClassifier.swift                362 LOC
    ├── SlotDefinitions.swift               315 LOC
    └── SlotPrediction.swift                209 LOC

18.2 BWB_Kiosk/BWB_Kiosk/ (Kiosk-Specific)

BWB_Kiosk/
├── BWB_KioskApp.swift
├── Services/
│   ├── VoiceOrderingOrchestrator.swift    ~750 LOC  — Central coordinator
│   └── VoiceOrderingTypes.swift           ~280 LOC  — Kiosk types + config
└── Views/
    ├── Setup/
    │   ├── KioskSetupView.swift
    │   └── KioskSettingsView.swift
    └── Ordering/
        ├── KioskMainView.swift
        ├── KioskVoiceOrderingView.swift    — Voice root (uses components)
        ├── KioskTouchOrderingView.swift    — Touch fallback
        └── Components/
            ├── VoiceWaveformView.swift
            ├── TranscriptDisplayView.swift
            ├── CartSummaryView.swift
            ├── ConfirmationOverlayView.swift
            ├── KioskVoiceSubviews.swift    — Header, Welcome, Active, ActionBar
            └── KioskPaymentViews.swift     — Payment, CartSheet, SettingsSheet

18.3 Tests

BWBCore/Tests/BWBCoreTests/
├── VoiceNLUEngineTests.swift
├── VoicePipelineTests.swift
├── Integration/KioskVoiceTests.swift
└── Mocks/MockVoiceService.swift

BWB_Kiosk/BWB_KioskTests/
└── BWB_KioskTests.swift

BWB_Kiosk/BWB_KioskUITests/
├── BWB_KioskUITests.swift
└── BWB_KioskUITestsLaunchTests.swift

---

This document was verified against the actual codebase on 2026-02-10.
Every LOC count, enum variant, method signature, and pattern list was confirmed by reading source files.
Update this document when code changes. It is the single source of truth for the voice ordering architecture.

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

BWB/BWB_Kiosk/docs/VOICE-ORDERING-ARCHITECTURE.md

Detected Structure

Method · Evaluation · References · Figures · Code Anchors · Architecture

Full Public Reader

Table of Contents

1. Vision & Philosophy

1.2 Design Principles

2. System Topology

2.1 Architecture Layers (with verified file counts)

2.3 Data Flow (Verified End-to-End)

3. Layer 1: Audio Foundation

3.1.1 Audio Session States (Verified Enum)

3.1.2 AVAudioSession Configurations (Verified)

3.1.3 Voice Service Callback Registration (Verified)

3.1.4 Pause/Resume Lifecycle (Verified Timing)

3.1.5 TTS Configuration (Verified)

3.2.1 Protocol (Verified)

3.2.2 Result Types (Verified)

3.2.3 Implementation Selection (Verified)

3.3.1 State Machine (Verified)

3.3.2 Detection Callback (Verified)

3.4.1 Timer Logic (Verified)

3.4.2 Prompt Messages (Verified)

4. Layer 2: Speech-to-Text Pipeline

4.1.2 Key Method (Verified)

4.2.1 Normalization Steps

4.2.2 Correction Markers (Used by Orchestrator)

4.3.1 State Types

4.5.1 Analysis Result (Verified)

4.5.2 Completion Reasons (Verified)

4.5.3 Multi-Signal Analysis (Verified Logic)

4.5.4 Configuration (Verified Defaults)

4.6.1 Algorithm (Verified)

4.7.1 Streaming Preview Pipeline

4.7.2 OrderPreview Mapping (Verified)

5. Layer 3: Natural Language Understanding

5.1.1 Parsing Strategies (Verified Enum + Implementations)

5.1.2 Context Building (Verified)

5.1.3 Parse Result (Verified)

5.1.4 Parse Intent Taxonomy (Verified — 11 Intents)

5.2.1 Merge Strategies (Verified)

5.2.2 Merge Logic (Verified Config)

5.3.1 Pattern Matching (Verified — All Patterns)

5.3.2 Classification Priority (Verified Order)

5.4.1 Slot Extraction Methods (All Verified)

5.4.2 Full Slot Extraction (Verified)

5.5.1 Processing Pipeline

5.7.1 Provider Abstraction

5.7.2 AI Parse Intent (Separate from OrderParseIntent)

5.7.3 Context Injection (Verified)

5.8.1 Slot Types (Verified Enum)

5.8.2 Slot Definition Structure

5.8.3 Enhanced Slot Prediction

5.9.1 Constraint Validation Pipeline

5.11.1 Violation Types

5.13.1 Modifier Categories

5.13.2 Quantity Safety (Verified in Orchestrator)

5.14.1 Scoring Factors

6. Layer 4: Dialogue Engine

6.1.1 State Machine (Verified)

6.1.2 Dependencies

6.1.3 Intent Handlers (Verified)

6.2.1 Configuration Presets (Verified)

6.2.2 Clarification Decision Logic (Verified)

6.2.3 Question Generation (Verified Strategies)

6.2.4 Clarification Context Tracking (Verified)

6.2.5 Response Processing (Verified)

6.3.1 Confirmation Styles

6.4.1 Confirmation Response Types (Verified)

6.4.2 Auto-Confirm Logic (Verified)

6.4.3 Delegate Pattern (Verified)

6.5.1 Recovery Types (Verified — 11 Types)

6.5.2 Recovery Response (Verified)

6.5.3 Escalation Model (Verified in Orchestrator)

6.6.1 Session Lifecycle (Verified)

6.6.2 Session Tracking in Orchestrator (Verified)

7. Layer 5: Order State Machine

7.1.1 Dual-Track Cart Model (Verified)

7.1.2 Operations (Verified)

7.1.3 Delegate (Verified)

7.2 VoiceParsedOrder (Verified — Complete Model)

7.3 Drink Attribute Enums (Verified — All Aliases)

7.3.1 DrinkSize (with dual alias systems)