Extracted abstract or opening context
The mesh currently has 8 distinct voice subsystems spread across iOS apps, macOS services, and backend flows. They are architecturally isolated — no subsystem talks to another. The terminal agents (Claude panes, Prefect flows, Discord bots) communicate exclusively through text. Voice exists at the edge (phone, glasses) but doesn't penetrate the mesh core.
| # | System | Location | Voice Capability | State | |---|--------|----------|-----------------|-------| | 1 | OpenClawHub DirectVoice | iOS app | Full STT→intent→Clawdbot→TTS loop | WORKING | | 2 | OpenClawHub Fleet Voice | iOS (Glasses Gateway) | Fleet control: status, resume, inject, kill, spawn | WORKING | | 3 | OpenClawHub QuadView | iOS | Push-to-talk → pane delegation | UI complete | | 4 | SpeakFlow | macOS + iOS keyboard | Global hotkey→STT→text injection + "hey claw" trigger | WORKING | | 5 | SecuriClaw WakeWord | iOS | Continuous wake phrase detection | WORKING | | 6 | Spore VoiceCapture | iOS | Voice→idea creation with keyword extraction | WORKING | | 7 | Voice Task Daemon | Mac1 LaunchAgent | Supabase mac_tasks → TTY pane injection | ACTIVE | | 8 | Transcription Intel | Prefect flow | Video transcripts → intelligence extraction | ACTIVE |
| Model/API | Where | Purpose | |-----------|-------|---------| | Apple SFSpeechRecognizer (on-device) | All iOS/macOS apps | STT | | ElevenLabs eleven_turbo_v2_5 | OpenClawHub (primary TTS) | High-quality agent speech | | AVSpeechSynthesizer | SpeakFlow, OpenClawHub fallback | System TTS | | Gemini 2.0 Flash Live (WebSocket) | OpenClawHub GeminiObserver | Ambient audio+video intelligence | | OpenAI TTS (6 voices) | Speak CLI reader | File reading | | macOS `say` + Edge TTS | LearnNKo | Language learning pronunciation | | CoreML MohamedSpeakerID | OpenClawHub | Owner voice identification | | Custom MFCC (vDSP FFT) | OpenClawHub | Speaker voiceprint matching |
**VoiceRouter (iOS)**: 35 intents mapping spoken keywords to ThreadCategory. Covers all projects (Koji, Milkmen, Spore, Serenity, CreativeDirector, CompCore, CogTwin, NKo, etc.). Also handles explicit channel routing.
**FleetVoiceRouter (iOS)**: Fleet-specific intents: `status`, `resume(target)`, `resumeAll`, `inject(target, command)`, `converge`, `diverge(prompt)`, `focus(target)`, `spawn(prompt)`, `kill(target)`, `fleetHealth`, `checkAbsorbing`, `unstickPane`, `checkFocus`, `checkTrajectory`. Has pronoun resolution (`lastTarget`) and destructive command confirmation (1.5s cancel window).
Why this is not always a full paper yet
Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.