Back to corpus
architecturetechnical paper candidatescore 32

Stage 0: Research — Voice-First Agent Architecture

The mesh currently has 8 distinct voice subsystems spread across iOS apps, macOS services, and backend flows. They are architecturally isolated — no subsystem talks to another. The terminal agents (Claude panes, Prefect flows, Discord bots) communicate exclusively through text. Voice exists at the edge (phone, glasses) but doesn't penetrate the mesh core.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

The mesh currently has 8 distinct voice subsystems spread across iOS apps, macOS services, and backend flows. They are architecturally isolated — no subsystem talks to another. The terminal agents (Claude panes, Prefect flows, Discord bots) communicate exclusively through text. Voice exists at the edge (phone, glasses) but doesn't penetrate the mesh core. | # | System | Location | Voice Capability | State | |---|--------|----------|-----------------|-------| | 1 | OpenClawHub DirectVoice | iOS app | Full STT→intent→Clawdbot→TTS loop | WORKING | | 2 | OpenClawHub Fleet Voice | iOS (Glasses Gateway) | Fleet control: status, resume, inject, kill, spawn | WORKING | | 3 | OpenClawHub QuadView | iOS | Push-to-talk → pane delegation | UI complete | | 4 | SpeakFlow | macOS + iOS keyboard | Global hotkey→STT→text injection + "hey claw" trigger | WORKING | | 5 | SecuriClaw WakeWord | iOS | Continuous wake phrase detection | WORKING | | 6 | Spore VoiceCapture | iOS | Voice→idea creation with keyword extraction | WORKING | | 7 | Voice Task Daemon | Mac1 LaunchAgent | Supabase mac_tasks → TTY pane injection | ACTIVE | | 8 | Transcription Intel | Prefect flow | Video transcripts → intelligence extraction | ACTIVE | | Model/API | Where | Purpose | |-----------|-------|---------| | Apple SFSpeechRecognizer (on-device) | All iOS/macOS apps | STT | | ElevenLabs eleven_turbo_v2_5 | OpenClawHub (primary TTS) | High-quality agent speech | | AVSpeechSynthesizer | SpeakFlow, OpenClawHub fallback | System TTS | | Gemini 2.0 Flash Live (WebSocket) | OpenClawHub GeminiObserver | Ambient audio+video intelligence | | OpenAI TTS (6 voices) | Speak CLI reader | File reading | | macOS `say` + Edge TTS | LearnNKo | Language learning pronunciation | | CoreML MohamedSpeakerID | OpenClawHub | Owner voice identification | | Custom MFCC (vDSP FFT) | OpenClawHub | Speaker voiceprint matching | **VoiceRouter (iOS)**: 35 intents mapping spoken keywords to ThreadCategory. Covers all projects (Koji, Milkmen, Spore, Serenity, CreativeDirector, CompCore, CogTwin, NKo, etc.). Also handles explicit channel routing. **FleetVoiceRouter (iOS)**: Fleet-specific intents: `status`, `resume(target)`, `resumeAll`, `inject(target, command)`, `converge`, `diverge(prompt)`, `focus(target)`, `spawn(prompt)`, `kill(target)`, `fleetHealth`, `checkAbsorbing`, `unstickPane`, `checkFocus`, `checkTrajectory`. Has pronoun resolution (`lastTarget`) and destructive command confirmation (1.5s cancel window).

Promotion decision

What has to happen next

Promote into a technical note or architecture paper with implementation anchors.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.