Stage 3: EXPAND + MASTER PLAN
**Primary STT: Deepgram Nova-3 (Cloud Streaming)** - Protocol: WebSocket to wss://api.deepgram.com/v1/listen - Parameters: encoding=linear16, sample_rate=16000, channels=1, model=nova-3, language=en, smart_format=true, keywords=["latte:2","cappuccino:2","espresso:2","oat:1.5","almond:1.5","large:1","medium:1","small:1"] - Partial results: interim_results=true (for live preview) - Latency: ~300ms for first partial, ~500ms for stable transcript - Cost: $0.0043/minute. At 500 orders/month, 30s average = $1.08/month. -
Full Public Reader
# Stage 3: EXPAND + MASTER PLAN
## LUME Commerce -- Experiential Commerce Infrastructure
---
3a. RISK AUDIT
### R1: GPU Contention During Voice Processing [CRITICAL]
- Failure scenario: When Whisper.cpp fallback activates (WiFi down), the CUDA inference burst (~2-3 seconds) steals GPU cycles from the visual pipeline. Frame rate drops from 60fps to 20-30fps. All customers see the stutter, not just the person ordering. The entertainment experience degrades.
- Probability: MEDIUM (35
- Impact: MEDIUM-HIGH. Visual stutter during ordering creates a jarring experience. If WiFi is frequently unreliable, this becomes a persistent problem.
- Mitigation: (a) Pre-warm Whisper model at boot but don't run inference until needed. (b) During Whisper inference, reduce visual pipeline to 30fps (skip every other frame, doubling GPU headroom). This is imperceptible to customers vs stuttery 45fps. (c) Batch audio: accumulate 5-10s of audio, process in one burst, return to 60fps. (d) Long-term: Whisper-TensorRT optimization gives 3x speedup, reducing contention window.
- Validation criteria: Visual pipeline maintains >30fps during Whisper inference burst. No visible stutter reported by 3 out of 3 test observers in blind test.
### R2: Voice Ordering Accuracy in Coffee Shop Noise [CRITICAL]
- Failure scenario: Background noise (espresso machine 70-80 dB, conversations 55-65 dB, music 60-70 dB) degrades STT accuracy below 70
- Probability: LOW with Deepgram (15
- Impact: CRITICAL. If voice ordering doesn't work, the entire Commerce tier value proposition fails.
- Mitigation: (a) Audio preprocessing: 3-mic beamforming (directional pickup toward customer, null toward espresso machine), noise gate, band-pass 200-4kHz, AGC. (b) Deepgram Nova-3 primary (trained on millions of hours of noisy real-world audio). (c) Domain-specific vocabulary hints sent to Deepgram (coffee menu terms as keywords parameter). (d) Local Whisper as degraded fallback only. (e) Touch fallback: LUME display shows order suggestions from partial transcript, customer taps to confirm. Never force voice-only.
- Validation criteria: >85
### R3: Content Flywheel Adoption Rate [MEDIUM]
- Failure scenario: Customers don't interact with LUME visuals during queue time. They stare at their phones. QR scan rate is <5
- Probability: MEDIUM (30
- Impact: MEDIUM. Commerce and analytics still work without the flywheel. Revenue is not dependent on content sharing. But the marketing moat doesn't build.
- Mitigation: (a) Prominent placement: LUME must face the queue, not be behind the counter. (b) Staff encouragement: "Have you tried the wall display while you wait?" (c) Group dynamics: one person interacting draws others. (d) Attract mode tuned to catch peripheral attention. (e) Content incentive: "Show your LUME clip for 10
- Validation criteria: >15
### R4: Deepgram API Reliability and Latency [MEDIUM]
- Failure scenario: Deepgram WebSocket connection drops mid-order. Or latency spikes to >1 second during peak hours (their server load). Customer says "large latte" and waits 3 seconds for response. Feels broken.
- Probability: LOW (15
- Impact: MEDIUM. Local Whisper fallback activates, degrading to slower but functional voice ordering.
- Mitigation: (a) WebSocket heartbeat monitoring: if ping >500ms, switch to local Whisper within 1 second. (b) Pre-buffer: start processing audio locally on Whisper while waiting for Deepgram response. Use whichever returns first. (c) Deepgram SLA: enterprise tier includes uptime guarantee. (d) Alternative cloud STT providers (AssemblyAI, Google Chirp) as second cloud fallback.
- Validation criteria: Total STT response time (cloud or fallback) <3 seconds for 99
### R5: BWB NLU Pattern Port Accuracy [MEDIUM]
- Failure scenario: Porting 80K lines of Swift NLU patterns to Rust introduces subtle regex/matching bugs. "Large" matches differently. Modifier detection misses "oat" in "oat milk latte." The ported code is 90
- Probability: MEDIUM (30
- Impact: MEDIUM. Incorrect parsing means wrong orders, customer frustration, barista overrides.
- Mitigation: (a) Port with test suite: extract 200+ test cases from BWB_KioskTests (202 existing tests + 217 voice ordering type tests). Every test must pass in Rust before shipping. (b) Run parallel validation: send same transcripts through both Swift (on iPad) and Rust (on LUME) NLU pipelines, compare outputs. Flag divergences. (c) Start with the 50 most common coffee orders only. Expand vocabulary incrementally.
- Validation criteria: 100
### R6: Kitchen Display Adoption [LOW-MEDIUM]
- Failure scenario: Baristas already have a workflow with their existing POS kitchen display. Adding LUME's web-based kitchen display means monitoring two screens. They ignore LUME orders.
- Probability: MEDIUM (35
- Impact: MEDIUM. Voice orders not fulfilled = customers waiting longer = bad experience.
- Mitigation: (a) Default: Route LUME orders to the EXISTING POS kitchen display via API (Square Orders API, Toast Kitchen API). LUME adds orders to the barista's existing workflow. (b) Fallback: LUME announces orders via speakers ("New order: large oat milk latte!"). Audio alert is impossible to miss. (c) Web kitchen display is for shops that DON'T have an existing KDS.
- Validation criteria: 95
### R7: Privacy Perception with Cameras in Commerce Space [LOW-MEDIUM]
- Failure scenario: Customers feel surveilled by a camera-equipped device in a payment/ordering area. Social media backlash: "This coffee shop is filming you to sell your data." Venue owner removes LUME.
- Probability: LOW (15
- Impact: MEDIUM-HIGH. One viral negative post could damage the brand.
- Mitigation: (a) "Privacy Mode" signage: "This device uses depth sensors only. No photos, no facial recognition, no personal data." (b) LED indicator: green light when analytics active (no images). Red light only during content clip capture (customer-initiated). (c) Depth-only mode: toggle that disables all RGB cameras, using depth for visuals and analytics. (d) Published privacy policy specific to LUME Commerce.
- Validation criteria: Zero unresolved privacy complaints in 90-day pilot. Customer survey: >80
### R8: Menu Configuration Complexity [LOW-MEDIUM]
- Failure scenario: Setting up a new venue requires manually entering every menu item, size, modifier, and price. For a coffee shop with 40 drinks and 15 modifiers, this takes 2+ hours. Shop owner gives up.
- Probability: MEDIUM (30
- Impact: LOW-MEDIUM. Bad setup = bad NLU accuracy = bad orders.
- Mitigation: (a) Square/Toast menu sync: auto-import menu from existing POS via API. One click. (b) CSV/JSON import for other POS systems. (c) Template menus: pre-built "Coffee Shop Standard" menu with 50+ items, shop owner enables/disables. (d) Companion app menu editor with search + categorize.
- Validation criteria: New venue menu configured in <30 minutes. Template menu covers >80
### R9: Zone Configuration for Analytics [LOW]
- Failure scenario: Shop owner defines queue zone incorrectly (too large, overlapping with seating). Analytics data is meaningless. "Your queue length is 47" when actual queue is 4 people.
- Probability: LOW (15
- Impact: LOW. Bad zones = bad analytics, but voice ordering and entertainment still work.
- Mitigation: (a) Companion app with live depth view. Shop owner draws zones on the depth camera view with finger. Instant visual feedback: "LUME sees 3 people in your queue zone." (b) Auto-suggest: LUME's depth map identifies the most common body positions during first 24 hours and suggests zones. (c) Manual override always available.
- Validation criteria: Zone setup completes in <10 minutes with visual validation. Auto-suggest accuracy >70
### R10: Single-Person Scaling for Commerce Support [MEDIUM]
- Failure scenario: Commerce introduces new failure modes: failed orders, incorrect charges, payment disputes, kitchen sync issues. Each venue generates 2-3 support tickets/month. At 50 venues, that's 100-150 tickets/month. Mohamed drowns in support instead of building.
- Probability: MEDIUM-HIGH (40
- Impact: HIGH. Support backlog kills customer retention. Venues churn because issues aren't resolved.
- Mitigation: (a) Self-diagnosing system: LUME logs every order attempt, STT confidence, NLU result, and payment status. Support dashboard shows full order trace. (b) Auto-recovery: failed STT -> retry with Whisper. Failed payment -> retry or route to counter. Failed kitchen push -> audio announcement fallback. (c) FAQ bot: LLM-powered chat support on lume.dance/support trained on LUME knowledge base. (d) Hire first support contractor at 30 commerce venues ($1.5K/month).
- Validation criteria: <5
---
3b. EXPANDED SPECIFICATIONS
Spec 1: On-Device Voice Model Decision
Primary STT: Deepgram Nova-3 (Cloud Streaming)
- Protocol: WebSocket to wss://api.deepgram.com/v1/listen
- Parameters: encoding=linear16, sample_rate=16000, channels=1, model=nova-3, language=en, smart_format=true, keywords=["latte:2","cappuccino:2","espresso:2","oat:1.5","almond:1.5","large:1","medium:1","small:1"]
- Partial results: interim_results=true (for live preview)
- Latency: ~300ms for first partial, ~500ms for stable transcript
- Cost: $0.0043/minute. At 500 orders/month, 30s average = $1.08/month.
- Accuracy: >95
Fallback STT: Whisper.cpp base.en (Local CUDA)
- Model: ggml-base.en.bin (200MB, 74M params)
- Inference: whisper.cpp with CUDA backend on Jetson Orin Nano Super
- Expected: ~2-3 seconds for 10 seconds of audio
- Accuracy: ~85
- Trigger: WiFi unreachable OR Deepgram latency >500ms for 3 consecutive requests
- GPU impact: ~3ms/frame for 5-10 frames during inference. Visual pipeline drops to 30fps temporarily.
NLU: BWB Pattern Engine (Local, CPU Only)
- No LLM. The ported BWB patterns handle:
- Drink names: 50+ items with 3-5 aliases each
- Sizes: small/medium/large/tall/grande/venti + numeric (12oz, 16oz, 20oz)
- Modifiers: milks (oat, almond, soy, coconut, whole, skim), temperatures (hot, iced, blended), extras (extra shot, light ice, no whip)
- Quantities: "two," "a couple," "three," explicit numbers
- Confidence: 0.0-1.0 per parsed item based on fuzzy match score
- Processing time: <50ms on Jetson CPU for typical order transcript
- Memory: ~10MB (compiled Rust + vocabulary data)
TTS: Piper (Local, CPU Only)
- Model: en_US-lessac-medium (50MB, VITS architecture)
- Latency: <100ms per sentence
- Quality: natural-sounding, suitable for confirmations and announcements
- Output: 22050Hz WAV -> LUME speakers or HDMI audio
Benchmark Summary:
| Metric | Cloud (Deepgram) | Local (Whisper) | Target |
|---|---|---|---|
| STT latency | ~500ms | ~2500ms | <3000ms |
| STT accuracy (70dB noise) | ~95 | ||
| NLU processing | 50ms | 50ms | <100ms |
| TTS latency | 100ms | 100ms | <200ms |
| Total order time | ~2s | ~4s | <5s |
| GPU impact | 0 | ||
| Monthly cost (500 orders) | $1.08 | $0 | <$5 |
Spec 2: Queue Analytics Data Model
-- Supabase tables for LUME Commerce analytics
CREATE TABLE lume_venues (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
slug TEXT UNIQUE NOT NULL,
lume_device_id TEXT UNIQUE,
timezone TEXT DEFAULT 'America/New_York',
menu_config JSONB, -- menu items, prices, aliases
zone_config JSONB, -- zone definitions (queue, service, exit, etc.)
branding JSONB, -- logo URL, watermark config, colors
payment_config JSONB, -- payment method, Square/Stripe creds
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE lume_analytics_snapshots (
id BIGSERIAL PRIMARY KEY,
venue_id UUID REFERENCES lume_venues(id),
captured_at TIMESTAMPTZ NOT NULL,
current_occupancy INT,
queue_length INT,
service_count INT,
estimated_wait_seconds INT,
avg_service_time_seconds FLOAT,
throughput_per_hour FLOAT,
conversion_rate FLOAT, -- reached service / entered
abandonment_rate FLOAT, -- exited without service
heatmap JSONB -- 20x20 grid of density values
);
CREATE INDEX idx_analytics_venue_time ON lume_analytics_snapshots(venue_id, captured_at);
CREATE TABLE lume_orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
venue_id UUID REFERENCES lume_venues(id),
order_number INT, -- venue-specific sequential number
items JSONB NOT NULL, -- [{name, quantity, modifiers, price}]
total_amount DECIMAL(10,2),
status TEXT DEFAULT 'pending', -- pending, preparing, ready, completed, cancelled
payment_status TEXT DEFAULT 'unpaid', -- unpaid, processing, paid, refunded
payment_method TEXT, -- counter, square, stripe, apple_pay
stt_method TEXT, -- deepgram, whisper_local
stt_confidence FLOAT,
nlu_confidence FLOAT,
customer_name TEXT, -- optional, from voice (e.g., "for Sarah")
created_at TIMESTAMPTZ DEFAULT NOW(),
confirmed_at TIMESTAMPTZ,
ready_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ
);
CREATE INDEX idx_orders_venue_status ON lume_orders(venue_id, status);
CREATE TABLE lume_content_clips (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
venue_id UUID REFERENCES lume_venues(id),
clip_url TEXT, -- CDN URL for web access
thumbnail_url TEXT,
duration_seconds FLOAT,
visual_preset TEXT,
audio_mode TEXT, -- listen, generate
qr_scanned BOOLEAN DEFAULT false,
downloaded BOOLEAN DEFAULT false,
shared_platform TEXT, -- tiktok, instagram, etc. (if tracked)
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE lume_track_events (
id BIGSERIAL PRIMARY KEY,
venue_id UUID REFERENCES lume_venues(id),
track_id INT, -- local centroid tracker ID (not persistent)
event_type TEXT NOT NULL, -- enter, zone_change, dwell, exit
zone TEXT, -- queue, service, entrance, exit, seating
position_x FLOAT,
position_y FLOAT,
dwell_seconds FLOAT, -- for dwell events
captured_at TIMESTAMPTZ NOT NULL
);
-- Partitioned by date for efficient cleanup
CREATE INDEX idx_track_events_venue_time ON lume_track_events(venue_id, captured_at);Spec 3: BWB Code Port Map (Swift -> Rust)
Principle: Port the LOGIC, not the syntax. The BWB Swift code is organized around iOS patterns (@MainActor, ObservableObject, Combine). The Rust code uses async channels (tokio::mpsc) and message-passing instead of observation.
BWB SWIFT LUME RUST
--------- ---------
VoiceOrderingOrchestrator voice::orchestrator::VoiceOrchestrator
@Published phase -> enum Phase { Idle, Listening, Processing, ... }
ObservableObject -> struct with tokio::mpsc::Sender
Singleton shared -> owned by CommerceEngine, not global
TranscriptPipeline voice::orchestrator (inline)
processIncoming() -> process_transcript(raw: &str, is_final: bool)
TranscriptNormalizer -> nlu::normalize(text: &str) -> String
TranscriptStabilityTracker -> stability_count field in orchestrator
LiveOrderPreviewGenerator nlu::menu_matcher::MenuMatcher
MenuAliasMatcher -> HashMap<String, MenuItem> + fuzzy match
ModifierDetector -> regex patterns for milk/temp/extras
QuantityExtractor -> regex patterns for numbers
CartCoordinator cart::coordinator::CartCoordinator
pendingOrders / confirmedCart -> Vec<ParsedItem> / Vec<ConfirmedItem>
addPendingOrder() -> add_pending(item: ParsedItem)
confirmAll() -> confirm_all() -> Vec<ConfirmedItem>
ConfirmationCoordinator cart::confirmation::ConfirmationFlow
AutoConfirmConfig (3s, 0.8 conf) -> ConfirmConfig { delay_ms: 3000, min_confidence: 0.8 }
confirmationTimer -> tokio::time::sleep(Duration::from_secs(3))
SessionManager cart::session::SessionManager
120s timeout -> tokio::time::timeout(Duration::from_secs(120))
recordActivity() -> reset_timeout()
FeedbackCoordinator voice::piper_bridge::PiperTTS
speak() / speakAndWait() -> speak(text: &str) -> impl Future
playAudio() / playHaptic() -> play_sound(sound: SoundType)
AVSpeechSynthesizer -> Piper subprocess via stdin/stdout
QueueService analytics::exporter::DataExporter
pendingOrders / inProgressOrders -> in lume_orders table, not in-memory
Supabase realtime subscription -> Supabase REST + WebSocket (same)
refreshInterval: 30s -> export_interval: 5s
KitchenDisplayView kitchen::web_display (Axum + htmx)
SwiftUI View -> HTML template with htmx SSE updates
Dark background -> CSS dark theme
Order cards grid -> CSS Grid layout
Status buttons -> htmx hx-post for status changesTest Parity Strategy:
Extract test cases from these BWB test files:
- `BWB_KioskTests/VoiceOrchestratorTests.swift` (112 lines)
- `BWB_KioskTests/VoiceOrderingTypesTests.swift` (217 lines)
- `BWB_KioskTests/CartManagerTests.swift` (190 lines)
- `BWB_KioskTests/BWB_KioskTests.swift` (202 lines)
- `BWB_KioskTests/TTSIntegrationTests.swift` (94 lines)
- `BWBCore` 434 passing tests (extract NLU-relevant subset)
Target: 150+ Rust tests covering the same cases before Commerce goes live.
Spec 4: Content Flywheel Technical Pipeline
CLIP CAPTURE PIPELINE:
1. Engagement Detection (runs every frame, ~0.1ms)
Input: centroid tracks in QUEUE zone from analytics engine
Logic:
IF any track in QUEUE zone for >15 seconds
AND track motion_energy > 0.3 (normalized 0-1)
AND no clip captured for this track in last 60 seconds
THEN trigger clip capture for this track
2. Clip Capture (triggered, 15-second duration)
- Start recording from circular buffer (last 5s already buffered)
- Record 10 more seconds (total 15s)
- Sources: tight camera (portrait 9:16) + visual overlay + audio
- Compositor: blend visual frame over camera feed (same as existing)
- Encode: NVENC H.265, 1080x1920 @ 30fps, ~15MB per clip
- Audio: room audio (LISTEN mode) or generated (GENERATE mode)
- Metadata: venue_slug, timestamp, visual_preset, duration
3. QR Code Generation (immediate after capture)
- URL: https://lume.dance/c/{venue_slug}/{clip_id_short}
- QR code rendered as overlay on LUME display (corner, non-intrusive)
- QR visible for 30 seconds, then fades
- Same QR also available via NFC tap (if V2 hardware)
4. Content Upload (background, WiFi)
- Upload clip to CDN (Cloudflare R2 or Supabase Storage)
- ~15MB per clip, WiFi 6E throughput: ~5 seconds
- Generate thumbnail (best frame by visual energy)
- Create entry in lume_content_clips table
5. Web Clip Page (served at lume.dance/c/...)
- Mobile-optimized page
- Auto-play preview (HLS, muted, 720p for quick load)
- Download button (full 1080p file to camera roll)
- Share buttons: TikTok, Instagram Stories, X, WhatsApp
- Venue credit: logo + name + location
- LUME credit: small "Made with LUME" badge
6. Analytics (tracked per clip)
- QR scanned: boolean (link opened)
- Preview watched: boolean (video loaded)
- Downloaded: boolean (download button tapped)
- Shared: platform (if detected via share intent)
- Impressions: estimated from platform APIs (future)Spec 5: Enterprise Fleet Dashboard
DASHBOARD: https://dashboard.lume.dance (web app, Supabase + Nexus-style)
PAGES:
1. Fleet Overview
- Map view: all locations with real-time status (green/yellow/red)
- Summary cards: total orders today, total queue time saved, total clips shared
- Alerts: devices offline, unusual queue lengths, low order accuracy
2. Location Detail (per venue)
- Real-time metrics: occupancy, queue length, wait time
- Heatmap overlay on venue floor plan
- Orders: list with status, time, items, voice accuracy
- Content: clips generated today, QR scans, downloads, shares
- Analytics: throughput chart (orders/hour), wait time trend, peak hours
3. Analytics Comparison
- Side-by-side: Location A vs Location B vs Fleet Average
- Metrics: wait time, throughput, conversion rate, content engagement
- Time range selector: today, 7 days, 30 days, custom
4. Voice Performance
- STT accuracy breakdown: Deepgram vs Whisper fallback rate
- Most misunderstood items (NLU confidence <0.7)
- Order correction rate (manual override by barista)
- Top 20 ordered items with voice accuracy per item
5. Content Performance
- Total clips: captured, scanned, downloaded, shared
- Top-performing clips (most shares)
- Engagement funnel: captured -> scanned -> downloaded -> shared
- Estimated social impressions per location
6. Settings
- Venue management: add/edit/deactivate locations
- Menu management: bulk update across fleet
- Zone configuration: templates by venue type
- User management: roles (owner, manager, support)
- Billing: subscription tier, invoices, payment method
TECH:
- Next.js on Vercel (or Nexus-style self-hosted)
- Supabase for data + auth + realtime
- Charts: Recharts or D3
- Map: Mapbox GL JS
- Responsive: works on desktop and iPad---
3c. MASTER EXECUTION CHECKLIST
### Wave 0: FOUNDATION (Days 1-5)
Set up the Rust Commerce Engine skeleton and validate STT on Jetson.
| # | Task | Input | Output | Validation | Auto | Status |
|---|---|---|---|---|---|---|
| 0.1 | Create `lume-commerce` Rust crate with module structure | Architecture from Stage 2 | Cargo.toml + src/ with all module files (empty stubs) | `cargo check` passes | yes | TODO |
| 0.2 | Integrate whisper.cpp as Rust FFI dependency | whisper.cpp repo + Jetson CUDA | `whisper_bridge.rs` with `transcribe(audio: &[f32]) -> String` | Unit test: transcribe a WAV file, get text | yes | TODO |
| 0.3 | Integrate Piper TTS as Rust subprocess bridge | Piper binary for aarch64 | `piper_bridge.rs` with `speak(text: &str) -> AudioBuffer` | Unit test: speak "hello", get WAV output | yes | TODO |
| 0.4 | Benchmark Whisper base.en on Jetson (CUDA) | 10 test audio files (coffee orders in noise) | Benchmark report: latency + accuracy per file | <3s for 10s audio, >80 | ||
| 0.5 | Set up Deepgram WebSocket streaming client | Deepgram API key | `deepgram_client.rs` with streaming + partial results | Unit test: stream audio, receive partial + final transcript | yes | TODO |
| 0.6 | Create `VoiceOrchestrator` state machine (empty transitions) | BWB VoiceOrderingOrchestrator.swift | `orchestrator.rs` with Phase enum + transition logic | Unit test: Idle -> Listening -> Processing -> Confirming -> Complete | yes | TODO |
| 0.7 | Port domain models from BWB (Order, MenuItem, Customization) | BWBCore/Models/*.swift | `cart/models.rs` with Rust structs + serde | Unit test: serialize/deserialize order JSON | yes | TODO |
Wave 0 Gate: Whisper.cpp runs on Jetson with CUDA. Deepgram streaming works. State machine compiles. Domain models serialize.
---
### Wave 1: NLU ENGINE PORT (Days 5-10)
Port the BWB voice NLU patterns from Swift to Rust.
| # | Task | Input | Output | Validation | Auto | Status |
|---|---|---|---|---|---|---|
| 1.1 | Extract coffee menu vocabulary from BWB `VoiceNLUEngine+Vocabulary.swift` | 17K lines Swift vocabulary | `nlu/vocabulary.json` (50+ items, 200+ aliases, modifiers) | Manual review: all major items present | no | TODO |
| 1.2 | Port `MenuAliasMatcher` to Rust | BWB MenuAliasMatcher logic | `nlu/menu_matcher.rs` with fuzzy string matching | Port 50 test cases from BWB, all pass | yes | TODO |
| 1.3 | Port `ModifierDetector` to Rust | BWB ModifierDetector patterns | `nlu/modifier_detector.rs` with regex patterns | Port 30 test cases: milk types, temp, extras all detected | yes | TODO |
| 1.4 | Port `QuantityExtractor` to Rust | BWB QuantityExtractor logic | `nlu/quantity_extractor.rs` | Port 20 test cases: "two," "a couple," "three" all extract correctly | yes | TODO |
| 1.5 | Port `ConfidenceScorer` to Rust | BWB OrderParsingPipeline confidence | `nlu/confidence_scorer.rs` | Confidence scores match BWB within 0.05 for 50 test transcripts | yes | TODO |
| 1.6 | Build `OrderResultMerger` (combines menu match + modifiers + qty) | BWB OrderResultMerger | `nlu/order_merger.rs` | End-to-end: "large iced oat milk latte" -> correct ParsedItem | yes | TODO |
| 1.7 | Create test corpus: 100 real coffee order transcripts with expected output | Manual creation + BWB logs | `tests/corpus/orders.json` | Reviewed for correctness | no | TODO |
| 1.8 | Run NLU pipeline against test corpus | Corpus + NLU pipeline | Accuracy report | >90 |
Wave 1 Gate: NLU pipeline parses "large iced oat milk latte with an extra shot" correctly. >90
---
### Wave 2: VOICE ORDERING FLOW (Days 10-16)
Wire STT -> NLU -> Cart -> Confirmation -> TTS into the complete ordering flow.
| # | Task | Input | Output | Validation | Auto | Status |
|---|---|---|---|---|---|---|
| 2.1 | Build audio preprocessor (noise gate + bandpass + AGC) | SpeakFlow AudioProcessingService as reference | `voice/audio_preprocessor.rs` | Test: 70dB pink noise input -> voice frequencies isolated | yes | TODO |
| 2.2 | Build STT tier selection (Deepgram primary, Whisper fallback) | deepgram_client + whisper_bridge | `voice/stt_bridge.rs` with auto-fallback | Test: kill WiFi -> switches to Whisper within 2s | yes | TODO |
| 2.3 | Wire VoiceOrchestrator: mic -> preprocessor -> STT -> NLU -> cart | All voice + NLU components | Full voice ordering pipeline | End-to-end test: speak order -> cart populated correctly | no | TODO |
| 2.4 | Port CartCoordinator from BWB | BWB CartCoordinator logic | `cart/coordinator.rs` | Test: add/remove/modify items, calculate total | yes | TODO |
| 2.5 | Port ConfirmationCoordinator from BWB | BWB ConfirmationCoordinator | `cart/confirmation.rs` with auto-confirm | Test: auto-confirm at 0.8 confidence after 3s | yes | TODO |
| 2.6 | Port SessionManager with 120s timeout | BWB SessionManager | `cart/session.rs` | Test: session times out after 120s of inactivity | yes | TODO |
| 2.7 | Build TTS feedback flow (confirmations, order-ready) | Piper bridge | TTS integrated into orchestrator | Speaks "Large oat milk latte, $5.50. Anything else?" | yes | TODO |
| 2.8 | Build order overlay renderer for Unity visual engine | Visual engine API | Text cards in particle field (from Stage 2, Step 5 design) | Order text visible on LUME display during voice ordering | no | TODO |
| 2.9 | Integration test: 20 consecutive voice orders on Jetson | Jetson dev kit + mic | 20 orders processed without crash or hang | All 20 complete, >85 |
Wave 2 Gate: A person can walk up to LUME, say a coffee order, see it confirmed on the visual display, and hear TTS confirmation. 20 consecutive orders without failure.
---
### Wave 3: QUEUE ANALYTICS (Days 14-19)
Build the depth-based queue analytics engine.
| # | Task | Input | Output | Validation | Auto | Status |
|---|---|---|---|---|---|---|
| 3.1 | Build BodyDetector from depth silhouette mask | Depth pipeline silhouette output | `analytics/body_detector.rs` | Correctly counts 1-5 people in test depth frames | yes | TODO |
| 3.2 | Build CentroidTracker with Kalman filter | BodyDetector output | `analytics/centroid_tracker.rs` | Tracks 3 people moving for 30s with <2 ID swaps | yes | TODO |
| 3.3 | Build ZoneClassifier with configurable regions | Zone config JSON + centroids | `analytics/zone_classifier.rs` | Correctly assigns centroids to queue/service/exit zones | yes | TODO |
| 3.4 | Build MetricsAggregator with rolling windows | Track events | `analytics/metrics.rs` | Calculates wait time, throughput, conversion for test data | yes | TODO |
| 3.5 | Build HeatmapGenerator (20x20 grid) | Track events | `analytics/heatmap.rs` | Generates plausible 2D density map from 5 minutes of track data | yes | TODO |
| 3.6 | Build DataExporter (SQLite local + Supabase cloud) | Metrics + track events | `analytics/exporter.rs` | Data visible in SQLite after 60s of operation | yes | TODO |
| 3.7 | Create Supabase schema for analytics (Spec 2) | SQL from Stage 3 | Supabase tables live | All tables created, test insert succeeds | yes | TODO |
| 3.8 | Integration test: analytics running alongside visual pipeline | Jetson + Femto Bolt + 2-3 people | Real-time queue count on LUME display + Supabase data | Count matches actual people. No visual stuttering. | no | TODO |
Wave 3 Gate: LUME correctly counts 1-5 people, tracks dwell time, and exports analytics to Supabase. Zero impact on visual pipeline performance.
---
### Wave 4: PAYMENT + KITCHEN ROUTING (Days 18-23)
Connect the ordering flow to payment and kitchen display.
| # | Task | Input | Output | Validation | Auto | Status |
|---|---|---|---|---|---|---|
| 4.1 | Build PaymentBridge (Square Terminal REST API) | Square Terminal API docs | `payment/square_terminal.rs` | Test: create order -> Terminal displays -> mock payment -> confirmation callback | yes | TODO |
| 4.2 | Build PaymentBridge (Stripe Terminal server-driven) | Stripe Terminal API docs | `payment/stripe_terminal.rs` | Same as 4.1 but with Stripe | yes | TODO |
| 4.3 | Build pay-at-counter flow (order push only, no payment on LUME) | Order confirmed event | WebSocket push to POS with order details | Barista receives order on their POS display | yes | TODO |
| 4.4 | Build KitchenRouter (WebSocket push to kitchen) | Order confirmed event | `kitchen/router.rs` | Kitchen display receives order within 2s of confirmation | yes | TODO |
| 4.5 | Build web-based Kitchen Display (htmx + Axum) | Kitchen display spec | `kitchen/web_display.rs` + HTML/CSS | Browser-based KDS shows orders, status buttons work | yes | TODO |
| 4.6 | Build "Order Ready" TTS announcement | Kitchen status update | LUME speakers announce customer name + order | "Sarah, your oat milk latte is ready!" plays clearly | yes | TODO |
| 4.7 | Wire order status from kitchen -> LUME -> visual celebration | Kitchen "ready" event | Visual celebration effect on LUME display | Particle burst when order marked ready | no | TODO |
| 4.8 | Integration test: full order lifecycle | Jetson + Square Terminal (or mock) | Voice order -> kitchen -> payment -> ready announcement | End-to-end in <60 seconds total | no | TODO |
Wave 4 Gate: Full order lifecycle works: voice order -> kitchen display -> payment (or counter pay) -> order-ready announcement with visual celebration.
---
### Wave 5: CONTENT FLYWHEEL (Days 22-27)
Build the auto-capture, QR, and content sharing pipeline.
| # | Task | Input | Output | Validation | Auto | Status |
|---|---|---|---|---|---|---|
| 5.1 | Build InteractionDetector (motion energy threshold in queue zone) | Analytics track data | `content/interaction_detector.rs` | Triggers clip capture when person interacts >15s with motion | yes | TODO |
| 5.2 | Build ClipCapture (15s, camera + visual overlay + audio) | Content compositor (existing) | `content/clip_capture.rs` | 15s H.265 clip on NVMe, visual overlay blended | yes | TODO |
| 5.3 | Build QRCodeGenerator for clip URLs | Clip metadata | `content/qr_generator.rs` | QR code rendered to framebuffer overlay | yes | TODO |
| 5.4 | Build ContentServer web pages for clip download | Clip files + CDN | `content/content_server.rs` + web pages | Mobile web page: preview + download + share buttons | yes | TODO |
| 5.5 | Build clip upload to CDN (Cloudflare R2) | Clip files on NVMe | `content/content_server.rs` upload path | Clip accessible at lume.dance/c/{slug}/{id} within 30s | yes | TODO |
| 5.6 | Add venue branding to clips (watermark, location tag) | Venue branding config | Branded clips | Venue logo visible in clip corner, location in metadata | yes | TODO |
| 5.7 | Build content analytics tracking (scans, downloads, shares) | Web page events | Supabase lume_content_clips updates | Dashboard shows clip funnel metrics | yes | TODO |
| 5.8 | Integration test: full content flywheel | Jetson + person interacting with LUME | Interaction -> clip -> QR -> scan -> download on phone | Complete flow in <60 seconds including upload | no | TODO |
Wave 5 Gate: A customer interacts with LUME visuals, a 15-second clip is captured, QR code appears, customer scans and downloads the clip on their phone with venue branding.
---
### Wave 6: DASHBOARD + PILOT DEPLOYMENT (Days 26-35)
Build the analytics dashboard and deploy to first coffee shop.
| # | Task | Input | Output | Validation | Auto | Status |
|---|---|---|---|---|---|---|
| 6.1 | Build analytics web dashboard (overview + location detail) | Supabase data + dashboard spec | Next.js app at dashboard.lume.dance | Real-time metrics visible for test venue | yes | TODO |
| 6.2 | Build voice performance dashboard page | Order data with STT/NLU confidence | Dashboard page | STT accuracy, misunderstood items, correction rate visible | yes | TODO |
| 6.3 | Build content performance dashboard page | Content clip analytics | Dashboard page | Clip funnel (captured -> scanned -> downloaded -> shared) visible | yes | TODO |
| 6.4 | Build venue setup wizard in companion app | Zone config + menu config | Companion app screens (or web wizard) | Venue configured in <30 minutes | no | TODO |
| 6.5 | Build menu import from Square POS API | Square API credentials | Auto-populated menu in LUME | Menu items match Square catalog within 95 | ||
| 6.6 | Create operations playbook (install guide, troubleshooting, FAQ) | All system knowledge | PDF/web document | Non-technical person can install LUME following the guide | no | TODO |
| 6.7 | Deploy LUME Commerce to pilot coffee shop (BWB venue) | LUME device + configured venue | Live deployment | System running in real environment | no | TODO |
| 6.8 | Pilot monitoring: Day 1-7 (supervised) | Live deployment | Daily metrics report | No critical failures, >80 | ||
| 6.9 | Pilot monitoring: Day 8-14 (remote) | Live deployment | Daily metrics report | Auto-recovery handles >80 | ||
| 6.10 | Pilot review: Day 14 (kill criteria check) | 14 days of metrics | Go/no-go decision | Voice accuracy >80 |
Wave 6 Gate: LUME Commerce running in a real coffee shop for 14 days. Voice ordering, queue analytics, content flywheel, and dashboard all operational. Kill criteria met.
---
### Wave 7: ENTERPRISE PREPARATION (Days 32-42)
Prepare for multi-location scaling.
| # | Task | Input | Output | Validation | Auto | Status |
|---|---|---|---|---|---|---|
| 7.1 | Build fleet management in dashboard (add/remove/monitor locations) | Dashboard + Supabase | Multi-venue dashboard | 3 test venues visible on map with independent metrics | yes | TODO |
| 7.2 | Build OTA menu update (push menu changes from dashboard to LUME) | Dashboard menu editor | LUME receives updated menu via HTTPS pull | Menu update reflects on LUME within 60 seconds | yes | TODO |
| 7.3 | Build multi-venue analytics comparison | Supabase aggregation queries | Dashboard comparison page | Side-by-side metrics for 2+ venues | yes | TODO |
| 7.4 | Build Stripe billing integration for Commerce tier | Stripe Billing API | Subscription management in dashboard | $149/month charges, upgrade/downgrade, invoices | yes | TODO |
| 7.5 | Create enterprise sales deck | Pilot metrics + product info | PDF/Keynote presentation | Compelling 10-slide deck with real pilot data | no | TODO |
| 7.6 | Create pricing page at lume.dance/commerce | Pricing tiers + CTA | Live web page | CTAs for demo booking + pilot signup | yes | TODO |
| 7.7 | Prepare 3 LUME Commerce units for enterprise pilot | Hardware + firmware | 3 configured devices ready to ship | All 3 boot, run full stack, connect to dashboard | no | TODO |
Wave 7 Gate: Ready to pitch multi-location chains with real pilot data, professional sales materials, and 3 ready-to-ship demo units.
---
EXECUTION SUMMARY
| Wave | Days | Tasks | Key Deliverable |
|---|---|---|---|
| 0: Foundation | 1-5 | 7 | Whisper + Deepgram + state machine + models |
| 1: NLU Port | 5-10 | 8 | BWB NLU patterns in Rust, >90 |
| 2: Voice Flow | 10-16 | 9 | End-to-end voice ordering on Jetson |
| 3: Analytics | 14-19 | 8 | Depth queue analytics, Supabase export |
| 4: Payment + Kitchen | 18-23 | 8 | Full order lifecycle with payment + kitchen |
| 5: Content Flywheel | 22-27 | 8 | Auto-capture + QR + download + share |
| 6: Dashboard + Pilot | 26-35 | 10 | Live pilot in real coffee shop |
| 7: Enterprise Prep | 32-42 | 7 | Sales materials + multi-location ready |
| TOTAL | 42 days | 65 tasks | Production Commerce in real venue |
Critical Path: Wave 0 -> Wave 1 -> Wave 2 -> Wave 4 -> Wave 6 (voice ordering is the critical dependency). Waves 3 and 5 can run in parallel with Waves 2 and 4.
Agent-Dispatchable Tasks: 48 of 65 (74
Kill Criteria:
- Day 5: If Whisper on Jetson <70
- Day 10: If NLU port <85
- Day 14 (pilot): If voice accuracy <80
- Day 35 (post-pilot): If barista satisfaction <7/10, rethink kitchen routing approach
- Day 42: If content QR scan rate <5
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
evo-cube-output/lume-commerce-pos/stage3-expand-master-plan.md
Detected Structure
Method · Evaluation · References · Figures · Code Anchors · Architecture · is Stage Research