Stage 0: Research — Twin Swarm + Cognitive Routing Architecture
> Date: 2026-03-10 | Evo-Cube #1 of 11 > Noosphere Context: Dreams gestating on orchestration + iteration concepts. KARL trajectory intelligence evo-cube at stage 2.
Full Public Reader
Stage 0: Research — Twin Swarm + Cognitive Routing Architecture
> Date: 2026-03-10 | Evo-Cube #1 of 11
> Noosphere Context: Dreams gestating on orchestration + iteration concepts. KARL trajectory intelligence evo-cube at stage 2.
What Exists Today
Four Parallel Routing Systems (Uncoordinated)
1. Discord Gateway (`handlers.ts`, 906 lines)
- TaskType: code | research | debug | docs | ops | general
- Classification: regex keyword matching on prompt content
- Model override: `gemini:` / `codex:` / `claude:` prefix forces model
- Rate limit fallback: Claude unavailable + model='any' → force Gemini
- Platform affinity: Swift/Xcode keywords → `platform_required = 'darwin'`
- No complexity scoring. No learning. No feedback loop.
2. NUMU Router (`numu-router/src/index.ts`, 375 lines)
- TaskType: creative | implementation | debugging | documentation | research | analysis
- Provider affinity: creative→gemini, implementation→claude, debugging→claude, documentation→gemini
- Complexity scoring: baseline 0.3, adjusted by word count (+0.2 for >200 words), keyword hits (+0.08 each high-complexity, -0.05 each low-complexity), code blocks, multi-step indicators
- Model tier: complexity > 0.7 → "high" (Opus/Pro), > 0.4 → "medium" (Sonnet/Flash), else "low" (Haiku/Flash-lite)
- Learned performance: tracks success rate per provider per task type, switches if delta > 15
- ACMP capacity check: if preferred provider exhausted → switch to available
- Decay: removes completions older than 30 days
- State: `[home-path]`
3. Cortex Ops Trigger (`ops_trigger.py`, 249 lines)
- Matches prompt text against SKILL.md auto-trigger regex patterns
- 500ms hard timeout (SIGALRM), 29ms typical
- Pane claims conflict detection (if another pane has domain claimed → skip)
- Currently: all 30+ skills have `status=None` (not active)
- KARL integration: init_session_buffer on skill injection
4. Cognitive Twin Brain-1 Router (`router.py`, 173 lines)
- Three tiers: IDENTITY (about Mo) → KNOWLEDGE (project/tech) → REASONING (multi-hop)
- Pre-compiled regex: 8 identity patterns, 10 multi-hop patterns, 9 graph patterns, 3 general patterns
- Confidence values: IDENTITY=0.95, REASONING=0.90, GENERAL=0.85, GRAPH=0.80, KNOWLEDGE+=0.65, KNOWLEDGE=0.60
- Maps to TwinConfig: A (bare), B (RAG only), C (RAG+Graph), D (RAG+Graph+RLM)
What These Four Systems Share
- All use regex keyword matching as primary classification
- None share classification results with each other
- None use semantic embeddings for classification
- Discord gateway and NUMU router both classify tasks but use different type taxonomies
- Only NUMU router has a learning/feedback loop
- Only Brain-1 router has confidence scores
- Only Cortex has pane awareness (domain conflict detection)
Cognitive Twin Infrastructure
Current state (v5 "Living Document"):
- `living_twin.py`: 593 lines, RAG+Graph+RLM pipeline
- Identity doc: 53 lines (`MO_IDENTITY.md`)
- KB: 4 JSONL files (knowledge_base v1, v2, v3_mined, architecture_knowledge)
- Training data: 16K+ lines (train_v4.jsonl), 909 valid, 909 test, 261 DPO pairs
- LoRA adapters: 7 checkpoints on Gemma 3 1B 4bit, rank 8, 500 iters
- Corrections: `twin_corrections` Supabase table, 5-min cache, top_k=5 per query
Benchmark (March 4, 2026 on Qwen3-Next-80B-A3B):
| Config | Accuracy | Latency |
|---|---|---|
| A: Bare LLM | 29.5
| B: + RAG | 87.2
| C: + Graph | 89.7
| D: Full RLM | 93.6
**Critical finding: RAG adds +57.7
Infrastructure status (current):
- MLX Server (Mac5:8100): OFFLINE
- Exo Cluster (Mac4:52415): OFFLINE
- Ollama (Mac4:11434): OFFLINE
- Graph Kernel (Mac1:8001): via SSH tunnel to cloud-vm
- RAG++ (Mac1:8000): via SSH tunnel to cloud-vm Docker
- Together AI: FREE serverless for Qwen3-235B-A22B
Twin Swarm Design (Planned, Unbuilt)
Three twins (from DEP 2026-02-14):
| Twin | Base | Tasks | Cost |
|---|---|---|---|
| Alpha "Coder" | Qwen3-235B + LoRA | Features, refactors, complex code | $0.20/$0.60 MTk |
| Beta "Architect" | Qwen3-235B + LoRA | Review, architecture, planning | $0.20/$0.60 MTk |
| Gamma "Runner" | Llama 3.1-8B + LoRA | Tests, docs, builds, simple fixes | $0.18/$0.18 MTk |
Budget: $180-315/mo projected. Cap: $500/mo.
Quality gates: Build check → Diff review (Beta reviews Alpha) → Pattern match (RAG++ comparison). Failure → escalate to frontier with twin output as context.
CALC Architecture (Cross-Agent Coordination)
Three agents (production): Claude Code (COMMANDER, Opus 4.6), Codex CLI (ANALYST, GPT-5.4), Gemini CLI (REVIEWER, Gemini 3 Pro)
Agent routing matrix:
| Task | Primary | Fallback | SLA |
|---|---|---|---|
| iOS builds | Claude | Codex | 5 min |
| Architecture | Claude | Gemini | 10 min |
| Large-context review | Gemini | Claude | 15 min |
| Evo3 synthesis | Gemini | Claude | 20 min |
| Data analysis | Codex | Claude | 5 min |
| Infrastructure | Claude | Codex | 10 min |
| Creative content | Gemini | Claude | 10 min |
Phase 4 (Cognitive Routing) explicitly called out as "next major unsolved problem."
Twin KB Mining Pipeline
`twin_kb_miner.py` (546 lines, Prefect flow):
- 7 sources: memory_turns (Supabase), kimi_memory (SQLite), claude_md files, obsidian vault, git commits, plan learnings, memory files
- Output: JSONL at `knowledge_base_v3_mined.jsonl`
- Dedup: exact match on first 100 chars of question
Cognitive Twin V9 Evo-Cube (Completed 2026-03-07)
Key decisions from that run:
- Phase 1: Ship Living Document Twin (zero training, correction loop, $0/mo)
- Phase 2: KB explosion (466 → 5,000+ entries)
- Phase 3: Three-tier routing (local → API → frontier)
- Phase 4: Continuous learning (nightly KB mining, weekly graph rebuild)
- 28 tasks, 35 days, $0-15/mo ongoing
- Gate thresholds: ≥88
What V9 did NOT cover: Unifying the 4 parallel routing systems. Twin Swarm deployment. CALC cognitive routing. Cross-agent task handoff. Semantic classification.
Critical Gaps
1. Four routing systems, zero coordination. Discord gateway, NUMU router, Cortex, and Brain-1 each classify independently. A task entering via Discord gets classified by handlers.ts, then if it reaches a Claude session, Cortex classifies it again with different taxonomy, and if it involves the twin, Brain-1 classifies it a third time.
2. No semantic classification. All four systems use regex keyword matching. The Agent Intelligence registry has a `tool_search` that does keyword matching. RAG++ has dense vector search (Gemini embeddings + pgvector). Neither is used for task classification.
3. No unified confidence model. Brain-1 has confidence scores (0.60-0.95). NUMU has complexity scores (0.0-1.0). Discord gateway has none. These are incomparable.
4. Twin Swarm is entirely unbuilt. Training data exists (16K+ JSONL). Together AI is ready (FREE for Qwen3-235B). Infrastructure specs exist. But no adapter has been trained on Together AI successfully (Feb 18 attempt reached epoch 2.0 step 90, not converged).
5. ACMP capacity is checked but not predicted. NUMU router checks current capacity. It does not predict when accounts will hit limits based on token consumption rate.
6. No task outcome tracking across routers. NUMU router tracks success/failure per provider. Discord gateway does not. No cross-system view of "which routing decisions led to successful outcomes."
7. Escalation is binary. Twin → frontier. No intermediate: "try a different twin", "try same twin with more context", "decompose and try sub-tasks."
8. Cost tracking is projected, not measured. $180-315/mo is a projection. No actual token metering exists for routing decisions.
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
evo-cube-output/twin-swarm-cognitive-routing/stage0-research.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture · is Stage Research