Agent Command Center — Voice-First Architecture
The ACC isn't just another iOS app — it's the **voice-first command interface** for the entire agent stack. Discord has been serving as the de facto command center. ACC formalizes that into a native experience where **voice is primary, visual is secondary**.
Full Public Reader
# Agent Command Center — Voice-First Architecture
Status: 🌱 Design Phase (from Mo's 1:21 AM voice notes, Feb 16 2026)
Origin: Voice brainstorm → formalized spec
---
Core Thesis
> "The voice itself is the most intuitive... Discord was our Agent Command Center replicate."
The ACC isn't just another iOS app — it's the voice-first command interface for the entire agent stack. Discord has been serving as the de facto command center. ACC formalizes that into a native experience where voice is primary, visual is secondary.
---
Architecture Layers
### Layer 0: Voice Interface (Primary)
- Always-on voice capture → transcription → intent routing
- Google Speech-to-Text API (enabled on `speech-to-order-476916` project)
- Fallback: local mlx_whisper (on-device, offline-capable)
- ElevenLabs voice clone (Mohamed voice ID: `TmSgyk1vGAD9YzdtJV3V`) for responses
- VisionClaw's ambient voice pipeline patterns as reference (but actually functional, not stubbed)
### Layer 1: Command Center UI (Background)
- SwiftUI + TCA (already built through Phase 4)
- Mission Control view = visual confirmation of voice commands
- Thread visualization = see Pulse execution trees
- Infrastructure health = at-a-glance agent status
### Layer 2: Orchestration Engine
- Clawdbot Gateway API — 6 unified endpoints (sessions, tasks, status, config, health, dispatch)
- Thread-Based Pulse — Discord threads as execution plane (`thread-pulse.sh`, `sub-pulse.sh`)
- Thread Manager — spawned Feb 15, thread 1472767053592002582 in #thread-manager
- Voice commands route through the same Gateway → same Pulse chains
### Layer 3: Agent Backbone
| Component | Role | Status |
|-----------|------|--------|
| Claude Code (T1a/T1b) | Primary coding agents via dual-max | ✅ Running |
| OpenRouter | Model routing for diverse LLM access | 🟡 To integrate |
| Kimi | Synthesis engine, always running | ✅ Running |
| Cognitive Twin | Fine-tuned model (training runs incremental) | 🟡 Needs compute |
| Codex (T2) | OpenAI agent tier | ✅ Available |
| Gemini (T3) | Google agent tier | ✅ Available |
---
OpenRouter Integration (New)
Mo's voice note mentions OpenRouter as a new capability to explore:
### What OpenRouter Gives Us
- Single API → access to 100+ models (Llama, Mistral, Qwen, Claude, GPT, etc.)
- Fallback routing — if one provider is down, auto-route to another
- Cost optimization — pick cheapest model that meets quality bar per task
- Training data generation — run same prompt through multiple models, compare outputs
How It Fits
Voice Command → Gateway → Task Router
├── Claude Code (complex coding)
├── OpenRouter (flexible model selection)
│ ├── cheap model for classification
│ ├── mid model for drafting
│ └── strong model for review
├── Kimi (synthesis, always on)
└── Twin (when trained, personal model)### OpenRouter as "Discord Replicate"
Mo asked: "Is that like the OpenRouter replicate [of Discord]?"
The parallel: Discord serves as our current multi-agent command center. OpenRouter serves as a multi-model command center. Same pattern — one interface, many backends.
---
Kimi + Twin Training Pipeline
From the voice notes:
> "Kimi will still be running. And then maybe we'll run that other instance and increment — create our training run."
### The Pipeline
1. Kimi stays running — continuous synthesis, memory, dream seeding
2. Twin training runs increment — each batch of conversations generates new SFT/DPO data
3. OpenRouter enables cheap experimentation — test prompts across models before committing to Twin training
4. "Running for a dollar" — likely referring to Together.ai serverless inference pricing for the trained Twin
Training Architecture
Daily conversations → Kimi synthesis → training data extraction
↓
SFT/DPO pairs
↓
Together.ai fine-tune job
↓
Twin LoRA adapter (Qwen3 235B base)
↓
Serverless inference ($0.20-0.60/MTk)---
Implementation Roadmap
### Phase A: Voice Pipeline (Priority)
- [ ] Wire Google STT into ACC (API key: `speech-to-order-476916`)
- [ ] Intent classifier: voice → Gateway API endpoint mapping
- [ ] ElevenLabs response synthesis
- [ ] Ambient mode (always listening, wake word activation)
### Phase B: OpenRouter Integration
- [ ] Sign up / get API key
- [ ] Add to Gateway as model provider
- [ ] Smart routing: task complexity → model selection
- [ ] Cost tracking per model per task type
### Phase C: Unified Command Flow
- [ ] Voice → ACC → Gateway → Agent dispatch
- [ ] Thread visualization in ACC (mirror Discord thread trees)
- [ ] Pulse chain initiation via voice
- [ ] Status queries: "What's running?" → agent capacity check
### Phase D: Twin Training Loop
- [ ] Incremental training data export from Kimi
- [ ] Together.ai fine-tune job automation
- [ ] Twin deployment via serverless
- [ ] A/B testing: Twin vs Claude vs OpenRouter models
---
Key Design Decisions
1. Voice is the primary interface, not a feature — ACC should work entirely by voice
2. Discord equivalence — ACC must do everything Discord does for agent control (Mo's Feb 14 insight)
3. OpenRouter for breadth, Claude for depth — use the right model for each task
4. Kimi never stops — continuous synthesis is the backbone of Twin training
5. Incremental training — don't wait for a massive dataset, train in batches
---
Formalized from Mo's 1:21-1:22 AM voice brainstorm. Architecture doc for #workshop exploration.
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
AgentCommandCenter/ACC-VOICE-ARCHITECTURE.md
Detected Structure
Method · Evaluation · Architecture