Stage 2: Compound Architecture — Twin Swarm + Cognitive Routing
**Decision:** Path D's "twin-first, escalate on failure" is the right default because it eliminates routing latency for 80% of tasks and leverages the key finding that RAG achieves 87.2% accuracy alone. But Path D's blind escalation is wasteful. Path B's cascade adds intelligence without replacing the fast path.
Full Public Reader
Stage 2: Compound Architecture — Twin Swarm + Cognitive Routing
> Sequential synthesis: each step inherits all prior context.
Step 1: Twin-First Default + Cascade Escalation (inherits Stage 0 + Path D + Path B)
Decision: Path D's "twin-first, escalate on failure" is the right default because it eliminates routing latency for 80
Compound architecture:
**Default flow (80
Task → Twin Alpha (Qwen3-235B via Together AI, RAG-augmented) → Output.
No classification. No routing. ~1500ms latency (RAG retrieval + generation).
Smart escalation (replaces Path D's blind retry):
When Twin Alpha fails (build error, test failure, user rejection), don't just retry or blindly escalate. Run Path B's Layer 1 feature classifier on the failed task to determine WHY it failed:
- Complexity > 0.7 → escalate to Claude (task was too hard for twin)
- Domain mismatch detected → route to domain specialist (Swift→Claude on Mac1, analysis→Codex)
- The failure reason itself becomes a classification signal
Result: Twin handles everything by default. Failures get intelligent triage. No upfront routing cost.
Step 2: Embedding Index for Learned Routing (inherits Step 1 + Path E)
Decision: Path E's embedding-based k-NN is the right "slow but smart" layer for the cascade. It uses existing infrastructure (Gemini embeddings, pgvector, Supabase) and requires zero model training.
Integration with Step 1:
The cascade becomes:
1. Default: Twin Alpha (0ms routing overhead)
2. On failure — Feature check: Layer 1 complexity/domain scoring (5ms)
3. If ambiguous — Embedding lookup: k-NN over historical task outcomes (60ms)
4. If still ambiguous — Static matrix: CALC routing table (0ms, hardcoded)
Bootstrap the embedding index:
- Phase 1 (Day 1-3): Tag 5,000 historical turns from `memory_turns` with agent + outcome metadata using Claude Opus batch labeling
- Phase 2 (Day 3-7): Generate Gemini embeddings for all 5,000, store in new pgvector table `task_routing_index`
- Phase 3 (Ongoing): Every completed task auto-adds its embedding + outcome to the index
New pgvector table:
CREATE TABLE task_routing_index (
id UUID DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding vector(768),
agent TEXT NOT NULL,
task_type TEXT,
success BOOLEAN NOT NULL,
duration_ms INTEGER,
complexity FLOAT,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON task_routing_index USING ivfflat (embedding vector_cosine_ops) WITH (lists = 50);Step 3: Evolution World Routing Genome (inherits Steps 1-2 + Path F)
Decision: Path F's evolutionary routing is the right meta-layer because it solves the threshold tuning problem. The cascade's confidence thresholds (0.85, 0.70, 0.50) and the provider weights should not be static — they should evolve based on actual outcomes.
What evolves:
Not the routing logic itself (that's the cascade from Steps 1-2), but its parameters:
- `twin_default_confidence`: How confident are we that Twin Alpha can handle any task? (starts at 0.80, evolves based on success rate)
- `escalation_complexity_threshold`: At what complexity score do we skip straight to frontier? (starts at 0.70)
- `embedding_knn_k`: How many neighbors to consider? (starts at 10)
- `embedding_confidence_floor`: Below what k-NN confidence do we fall back to static? (starts at 0.50)
- `provider_weights`: Per-task-type agent preference weights (starts at CALC matrix values)
Fitness function (from Path F, refined):
fitness = 0.40 * task_success_rate
+ 0.25 * (1 - avg_latency / 30000) // normalized to 30s max
+ 0.20 * (1 - avg_cost / 0.01) // normalized to $0.01 max per task
+ 0.15 * (1 - escalation_rate) // fewer escalations = betterMutation cadence: Every 200 tasks (approximately daily at current volume). L1 perturbs parameters. L2 adjusts mutation step size weekly. L3 not needed at this scale — the search space is small enough for L1+L2.
Non-halting invariant: No agent's weight can drop below 0.05 for any task type. This prevents the router from "forgetting" that an agent exists.
Step 4: Agent Bidding for High-Stakes Tasks (inherits Steps 1-3 + Path C)
Decision: Path C's auction model is too expensive for every task (2s bid window), but it's exactly right for high-stakes tasks where the routing genome's evolved parameters might not be optimal.
When to auction:
- Complexity > 0.85 (the hardest tasks)
- Estimated cost > $0.05 per task (expensive enough to optimize)
- Task is decomposable (the auction can split it)
- Task affects shared infrastructure (deploys, migrations, schema changes)
Simplified bidding:
Instead of a full 2-second window, each agent maintains a continuously-updated availability vector:
{
"agent_id": "twin-alpha",
"capacity": 0.8, // 80% headroom
"affinity_scores": { // pre-computed from recent task history
"code": 0.85,
"debug": 0.70,
"creative": 0.30
},
"rate_limit_status": "green",
"current_queue_depth": 2
}Published to NUMU topic `agent.status` every 30 seconds. The routing layer reads the latest status (no bid window needed) and combines it with the genome weights.
Scoring for high-stakes tasks:
score = genome_weight * 0.4
+ agent_affinity * 0.3
+ agent_capacity * 0.2
+ (1 - agent_cost) * 0.1Result: Fast tasks (80
Step 5: Unified Telemetry + Feedback Loop (inherits Steps 1-4 + Path A's training data insight)
Decision: Path A's insight about using 112K turns for training data is correct, but instead of training a classifier, we use that data to populate the embedding index (Step 2) and to feed the evolution genome (Step 3). The training data is the fuel, not for a model, but for the routing infrastructure.
Unified telemetry schema:
Every routing decision produces a telemetry event:
{
"task_id": "uuid",
"timestamp": "ISO-8601",
"source": "discord|cli|voice|pane",
"cascade_layer_hit": 0|1|2|3,
"classification": { "task_type": "code", "complexity": 0.65, "confidence": 0.82 },
"routed_to": "twin-alpha",
"genome_version": "v47",
"high_stakes_auction": false,
"outcome": {
"success": true,
"duration_ms": 4500,
"cost_usd": 0.002,
"escalated": false,
"user_rejected": false
}
}Storage: Supabase table `routing_telemetry`. Prometheus metrics for real-time dashboards. Nexus Portal `/routing` page.
Three feedback loops:
1. Embedding index growth: Every telemetry event with outcome → new vector in `task_routing_index` (Step 2)
2. Genome evolution: Every 200 telemetry events → L1 mutation of routing parameters (Step 3)
3. NUMU router learning: `recordCompletion()` already exists — wire it to the telemetry stream
Convergence target:
After 2,000 tasks (~2 weeks), the embedding index has enough density for 90
Final Compound Architecture
┌──────────────────────────────────┐
│ TASK ENTRY │
│ (Discord/CLI/Voice/Pane/Cron) │
└──────────────┬───────────────────┘
│
┌────────────▼────────────┐
│ Complexity Quick-Check │
│ (is this high-stakes?) │
└──┬─────────────────┬────┘
│ │
(normal) (high-stakes)
│ │
┌────────▼──────┐ ┌───────▼────────┐
│ Twin Alpha │ │ Agent Scoring │
│ (default) │ │ (genome+live │
└───┬──────┬───┘ │ capacity) │
│ │ └───────┬────────┘
(success) (fail) │
│ │ Route to winner
▼ │
Output ┌──▼───────────┐
│ Smart Triage │
│ (Layer 1+2) │
└──────┬───────┘
│
Route to specialist
│
▼
┌──────────────────────────┐
│ Telemetry → Supabase │
│ → Embedding Index │
│ → Genome Evolution │
│ → NUMU Learning │
└──────────────────────────┘Cost: $0-15/mo for twin serving (Together AI free tier) + $0 for Claude/Codex/Gemini (existing subscriptions) + ~$5/mo Supabase compute for pgvector index + existing infrastructure.
Engineering effort: ~2 weeks for the core (twin deployment + embedding index + telemetry). Genome evolution adds ~3 days. Agent status broadcasting adds ~2 days. Total: ~3 weeks to production.
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
evo-cube-output/twin-swarm-cognitive-routing/stage2-compound.md
Detected Structure
Method · Evaluation · Architecture · is Stage Research