Mohamed Diomande

Full HTML reader

Read the full artifact

Extracted abstract or opening context

**1. The unified router eliminates the triple-classifier problem.** Three intent classifiers with incompatible taxonomies is the root cause of inconsistent voice behavior across devices. One server-side router, shared by all clients, fixes this permanently. The ~55 merged intents cover all existing use cases. **2. Mac Ear Daemon is the single biggest UX improvement.** Eliminating the phone dependency for voice interaction changes the relationship with the mesh. Walk to the desk, say "status", get a spoken briefing. No phone required. mlx-whisper on M2 handles transcription locally with no API cost. **3. Voice memory closes the biggest persistence gap.** Every other interaction channel (text prompts, Discord, code, Obsidian) is persisted. Voice isn't. Storing transcripts in Supabase + RAG++ means "we discussed this earlier" works across modalities. This is a force multiplier for the entire knowledge system. **4. The ElevenLabs integration is already production-ready.** Voice ID configured, API key active, streaming playback works in iOS. Extending this to Mac1 TTS is ~20 lines of Python (HTTP POST + audio playback). **1. Whisper on Mac1 competes for compute.** Mac1 is already running: 7 LaunchAgents, Xcode builds, SSH tunnels, the pane orchestrator, and terminal Claude sessions. Adding continuous audio capture + Whisper inference adds CPU/memory pressure.

Promotion decision

What has to happen next

Promote into a technical note or architecture paper with implementation anchors.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.