Grand Diomande Research · Full HTML Reader

Agent Command Center — Voice-First Architecture

The ACC isn't just another iOS app — it's the **voice-first command interface** for the entire agent stack. Discord has been serving as the de facto command center. ACC formalizes that into a native experience where **voice is primary, visual is secondary**.

Agents That Account for Themselves architecture technical paper candidate score 40 .md

Full Public Reader

# Agent Command Center — Voice-First Architecture
Status: 🌱 Design Phase (from Mo's 1:21 AM voice notes, Feb 16 2026)
Origin: Voice brainstorm → formalized spec

---

Core Thesis

> "The voice itself is the most intuitive... Discord was our Agent Command Center replicate."

The ACC isn't just another iOS app — it's the voice-first command interface for the entire agent stack. Discord has been serving as the de facto command center. ACC formalizes that into a native experience where voice is primary, visual is secondary.

---

Architecture Layers

### Layer 0: Voice Interface (Primary)
- Always-on voice capture → transcription → intent routing
- Google Speech-to-Text API (enabled on `speech-to-order-476916` project)
- Fallback: local mlx_whisper (on-device, offline-capable)
- ElevenLabs voice clone (Mohamed voice ID: `TmSgyk1vGAD9YzdtJV3V`) for responses
- VisionClaw's ambient voice pipeline patterns as reference (but actually functional, not stubbed)

### Layer 1: Command Center UI (Background)
- SwiftUI + TCA (already built through Phase 4)
- Mission Control view = visual confirmation of voice commands
- Thread visualization = see Pulse execution trees
- Infrastructure health = at-a-glance agent status

### Layer 2: Orchestration Engine
- Clawdbot Gateway API — 6 unified endpoints (sessions, tasks, status, config, health, dispatch)
- Thread-Based Pulse — Discord threads as execution plane (`thread-pulse.sh`, `sub-pulse.sh`)
- Thread Manager — spawned Feb 15, thread 1472767053592002582 in #thread-manager
- Voice commands route through the same Gateway → same Pulse chains

### Layer 3: Agent Backbone
| Component | Role | Status |
|-----------|------|--------|
| Claude Code (T1a/T1b) | Primary coding agents via dual-max | ✅ Running |
| OpenRouter | Model routing for diverse LLM access | 🟡 To integrate |
| Kimi | Synthesis engine, always running | ✅ Running |
| Cognitive Twin | Fine-tuned model (training runs incremental) | 🟡 Needs compute |
| Codex (T2) | OpenAI agent tier | ✅ Available |
| Gemini (T3) | Google agent tier | ✅ Available |

---

OpenRouter Integration (New)

Mo's voice note mentions OpenRouter as a new capability to explore:

### What OpenRouter Gives Us
- Single API → access to 100+ models (Llama, Mistral, Qwen, Claude, GPT, etc.)
- Fallback routing — if one provider is down, auto-route to another
- Cost optimization — pick cheapest model that meets quality bar per task
- Training data generation — run same prompt through multiple models, compare outputs

How It Fits

Voice Command → Gateway → Task Router
                            ├── Claude Code (complex coding)
                            ├── OpenRouter (flexible model selection)
                            │     ├── cheap model for classification
                            │     ├── mid model for drafting
                            │     └── strong model for review
                            ├── Kimi (synthesis, always on)
                            └── Twin (when trained, personal model)

### OpenRouter as "Discord Replicate"
Mo asked: "Is that like the OpenRouter replicate [of Discord]?"

The parallel: Discord serves as our current multi-agent command center. OpenRouter serves as a multi-model command center. Same pattern — one interface, many backends.

---

Kimi + Twin Training Pipeline

From the voice notes:
> "Kimi will still be running. And then maybe we'll run that other instance and increment — create our training run."

### The Pipeline
1. Kimi stays running — continuous synthesis, memory, dream seeding
2. Twin training runs increment — each batch of conversations generates new SFT/DPO data
3. OpenRouter enables cheap experimentation — test prompts across models before committing to Twin training
4. "Running for a dollar" — likely referring to Together.ai serverless inference pricing for the trained Twin

Training Architecture

Daily conversations → Kimi synthesis → training data extraction
                                           ↓
                                    SFT/DPO pairs
                                           ↓
                              Together.ai fine-tune job
                                           ↓
                              Twin LoRA adapter (Qwen3 235B base)
                                           ↓
                              Serverless inference ($0.20-0.60/MTk)

---

Implementation Roadmap

### Phase A: Voice Pipeline (Priority)
- [ ] Wire Google STT into ACC (API key: `speech-to-order-476916`)
- [ ] Intent classifier: voice → Gateway API endpoint mapping
- [ ] ElevenLabs response synthesis
- [ ] Ambient mode (always listening, wake word activation)

### Phase B: OpenRouter Integration
- [ ] Sign up / get API key
- [ ] Add to Gateway as model provider
- [ ] Smart routing: task complexity → model selection
- [ ] Cost tracking per model per task type

### Phase C: Unified Command Flow
- [ ] Voice → ACC → Gateway → Agent dispatch
- [ ] Thread visualization in ACC (mirror Discord thread trees)
- [ ] Pulse chain initiation via voice
- [ ] Status queries: "What's running?" → agent capacity check

### Phase D: Twin Training Loop
- [ ] Incremental training data export from Kimi
- [ ] Together.ai fine-tune job automation
- [ ] Twin deployment via serverless
- [ ] A/B testing: Twin vs Claude vs OpenRouter models

---

Key Design Decisions

1. Voice is the primary interface, not a feature — ACC should work entirely by voice
2. Discord equivalence — ACC must do everything Discord does for agent control (Mo's Feb 14 insight)
3. OpenRouter for breadth, Claude for depth — use the right model for each task
4. Kimi never stops — continuous synthesis is the backbone of Twin training
5. Incremental training — don't wait for a massive dataset, train in batches

---

Formalized from Mo's 1:21-1:22 AM voice brainstorm. Architecture doc for #workshop exploration.

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

AgentCommandCenter/ACC-VOICE-ARCHITECTURE.md

Detected Structure

Method · Evaluation · Architecture