Back to corpus
proposalexperiment writeup candidatescore 34

Stage 3: EXPAND + MASTER PLAN

**Primary STT: Deepgram Nova-3 (Cloud Streaming)** - Protocol: WebSocket to wss://api.deepgram.com/v1/listen - Parameters: encoding=linear16, sample_rate=16000, channels=1, model=nova-3, language=en, smart_format=true, keywords=["latte:2","cappuccino:2","espresso:2","oat:1.5","almond:1.5","large:1","medium:1","small:1"] - Partial results: interim_results=true (for live preview) - Latency: ~300ms for first partial, ~500ms for stable transcript - Cost: $0.0043/minute. At 500 orders/month, 30s average = $1.08/month. -

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

# Stage 3: EXPAND + MASTER PLAN ## LUME Commerce -- Experiential Commerce Infrastructure ### R1: GPU Contention During Voice Processing [CRITICAL] - **Failure scenario:** When Whisper.cpp fallback activates (WiFi down), the CUDA inference burst (~2-3 seconds) steals GPU cycles from the visual pipeline. Frame rate drops from 60fps to 20-30fps. All customers see the stutter, not just the person ordering. The entertainment experience degrades. - **Probability:** MEDIUM (35%). Only triggers when WiFi is down AND local Whisper is needed. In normal operation (Deepgram cloud STT), zero GPU impact. - **Impact:** MEDIUM-HIGH. Visual stutter during ordering creates a jarring experience. If WiFi is frequently unreliable, this becomes a persistent problem. - **Mitigation:** (a) Pre-warm Whisper model at boot but don't run inference until needed. (b) During Whisper inference, reduce visual pipeline to 30fps (skip every other frame, doubling GPU headroom). This is imperceptible to customers vs stuttery 45fps. (c) Batch audio: accumulate 5-10s of audio, process in one burst, return to 60fps. (d) Long-term: Whisper-TensorRT optimization gives 3x speedup, reducing contention window. - **Validation criteria:** Visual pipeline maintains >30fps during Whisper inference burst. No visible stutter reported by 3 out of 3 test observers in blind test. ### R2: Voice Ordering Accuracy in Coffee Shop Noise [CRITICAL] - **Failure scenario:** Background noise (espresso machine 70-80 dB, conversations 55-65 dB, music 60-70 dB) degrades STT accuracy below 70%. Customers repeat orders 2-3 times. Frustration kills adoption. Baristas override voice orders manually, defeating the purpose. - **Probability:** LOW with Deepgram (15%), MEDIUM-HIGH with local Whisper (40%) - **Impact:** CRITICAL. If voice ordering doesn't work, the entire Commerce tier value proposition fails. - **Mitigation:** (a) Audio preprocessing: 3-mic beamforming (directional pickup toward customer, null toward espresso machine), noise gate, band-pass 200-4kHz, AGC. (b) Deepgram Nova-3 primary (trained on millions of hours of noisy real-world audio). (c) Domain-specific vocabulary hints sent to Deepgram (coffee menu terms as keywords parameter). (d) Local Whisper as degraded fallback only. (e) Touch fallback: LUME display shows order suggestions from partial transcript, customer taps to confirm. Never force voice-only. - **Validation criteria:** >85% order accuracy (correct items + modifiers) in 70 dB ambient noise, measured over 100 test orders with varied accents and speaking styles. ### R3: Content Flywheel Adoption Rate [MEDIUM] - **Failure scenario:** Customers don't interact with LUME visuals during queue time. They stare at their phones. QR scan rate is <5%. Zero viral content. The entertainment-as-marketin

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.