Grand Diomande Research · Full HTML Reader

Stage 2: Compound Architecture — Twin Swarm + Cognitive Routing

**Decision:** Path D's "twin-first, escalate on failure" is the right default because it eliminates routing latency for 80% of tasks and leverages the key finding that RAG achieves 87.2% accuracy alone. But Path D's blind escalation is wasteful. Path B's cascade adds intelligence without replacing the fast path.

Agents That Account for Themselves architecture technical paper candidate score 22 .md

Full Public Reader

Stage 2: Compound Architecture — Twin Swarm + Cognitive Routing

> Sequential synthesis: each step inherits all prior context.

Step 1: Twin-First Default + Cascade Escalation (inherits Stage 0 + Path D + Path B)

Decision: Path D's "twin-first, escalate on failure" is the right default because it eliminates routing latency for 80

Compound architecture:

**Default flow (80
Task → Twin Alpha (Qwen3-235B via Together AI, RAG-augmented) → Output.
No classification. No routing. ~1500ms latency (RAG retrieval + generation).

Smart escalation (replaces Path D's blind retry):
When Twin Alpha fails (build error, test failure, user rejection), don't just retry or blindly escalate. Run Path B's Layer 1 feature classifier on the failed task to determine WHY it failed:
- Complexity > 0.7 → escalate to Claude (task was too hard for twin)
- Domain mismatch detected → route to domain specialist (Swift→Claude on Mac1, analysis→Codex)
- The failure reason itself becomes a classification signal

Result: Twin handles everything by default. Failures get intelligent triage. No upfront routing cost.

Step 2: Embedding Index for Learned Routing (inherits Step 1 + Path E)

Decision: Path E's embedding-based k-NN is the right "slow but smart" layer for the cascade. It uses existing infrastructure (Gemini embeddings, pgvector, Supabase) and requires zero model training.

Integration with Step 1:
The cascade becomes:
1. Default: Twin Alpha (0ms routing overhead)
2. On failure — Feature check: Layer 1 complexity/domain scoring (5ms)
3. If ambiguous — Embedding lookup: k-NN over historical task outcomes (60ms)
4. If still ambiguous — Static matrix: CALC routing table (0ms, hardcoded)

Bootstrap the embedding index:
- Phase 1 (Day 1-3): Tag 5,000 historical turns from `memory_turns` with agent + outcome metadata using Claude Opus batch labeling
- Phase 2 (Day 3-7): Generate Gemini embeddings for all 5,000, store in new pgvector table `task_routing_index`
- Phase 3 (Ongoing): Every completed task auto-adds its embedding + outcome to the index

New pgvector table:

sql
CREATE TABLE task_routing_index (
  id UUID DEFAULT gen_random_uuid(),
  content TEXT NOT NULL,
  embedding vector(768),
  agent TEXT NOT NULL,
  task_type TEXT,
  success BOOLEAN NOT NULL,
  duration_ms INTEGER,
  complexity FLOAT,
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON task_routing_index USING ivfflat (embedding vector_cosine_ops) WITH (lists = 50);

Step 3: Evolution World Routing Genome (inherits Steps 1-2 + Path F)

Decision: Path F's evolutionary routing is the right meta-layer because it solves the threshold tuning problem. The cascade's confidence thresholds (0.85, 0.70, 0.50) and the provider weights should not be static — they should evolve based on actual outcomes.

What evolves:
Not the routing logic itself (that's the cascade from Steps 1-2), but its parameters:
- `twin_default_confidence`: How confident are we that Twin Alpha can handle any task? (starts at 0.80, evolves based on success rate)
- `escalation_complexity_threshold`: At what complexity score do we skip straight to frontier? (starts at 0.70)
- `embedding_knn_k`: How many neighbors to consider? (starts at 10)
- `embedding_confidence_floor`: Below what k-NN confidence do we fall back to static? (starts at 0.50)
- `provider_weights`: Per-task-type agent preference weights (starts at CALC matrix values)

Fitness function (from Path F, refined):

fitness = 0.40 * task_success_rate
        + 0.25 * (1 - avg_latency / 30000)  // normalized to 30s max
        + 0.20 * (1 - avg_cost / 0.01)       // normalized to $0.01 max per task
        + 0.15 * (1 - escalation_rate)        // fewer escalations = better

Mutation cadence: Every 200 tasks (approximately daily at current volume). L1 perturbs parameters. L2 adjusts mutation step size weekly. L3 not needed at this scale — the search space is small enough for L1+L2.

Non-halting invariant: No agent's weight can drop below 0.05 for any task type. This prevents the router from "forgetting" that an agent exists.

Step 4: Agent Bidding for High-Stakes Tasks (inherits Steps 1-3 + Path C)

Decision: Path C's auction model is too expensive for every task (2s bid window), but it's exactly right for high-stakes tasks where the routing genome's evolved parameters might not be optimal.

When to auction:
- Complexity > 0.85 (the hardest tasks)
- Estimated cost > $0.05 per task (expensive enough to optimize)
- Task is decomposable (the auction can split it)
- Task affects shared infrastructure (deploys, migrations, schema changes)

Simplified bidding:
Instead of a full 2-second window, each agent maintains a continuously-updated availability vector:

json
{
  "agent_id": "twin-alpha",
  "capacity": 0.8,           // 80% headroom
  "affinity_scores": {        // pre-computed from recent task history
    "code": 0.85,
    "debug": 0.70,
    "creative": 0.30
  },
  "rate_limit_status": "green",
  "current_queue_depth": 2
}

Published to NUMU topic `agent.status` every 30 seconds. The routing layer reads the latest status (no bid window needed) and combines it with the genome weights.

Scoring for high-stakes tasks:

score = genome_weight * 0.4
      + agent_affinity * 0.3
      + agent_capacity * 0.2
      + (1 - agent_cost) * 0.1

Result: Fast tasks (80

Step 5: Unified Telemetry + Feedback Loop (inherits Steps 1-4 + Path A's training data insight)

Decision: Path A's insight about using 112K turns for training data is correct, but instead of training a classifier, we use that data to populate the embedding index (Step 2) and to feed the evolution genome (Step 3). The training data is the fuel, not for a model, but for the routing infrastructure.

Unified telemetry schema:
Every routing decision produces a telemetry event:

json
{
  "task_id": "uuid",
  "timestamp": "ISO-8601",
  "source": "discord|cli|voice|pane",
  "cascade_layer_hit": 0|1|2|3,
  "classification": { "task_type": "code", "complexity": 0.65, "confidence": 0.82 },
  "routed_to": "twin-alpha",
  "genome_version": "v47",
  "high_stakes_auction": false,
  "outcome": {
    "success": true,
    "duration_ms": 4500,
    "cost_usd": 0.002,
    "escalated": false,
    "user_rejected": false
  }
}

Storage: Supabase table `routing_telemetry`. Prometheus metrics for real-time dashboards. Nexus Portal `/routing` page.

Three feedback loops:
1. Embedding index growth: Every telemetry event with outcome → new vector in `task_routing_index` (Step 2)
2. Genome evolution: Every 200 telemetry events → L1 mutation of routing parameters (Step 3)
3. NUMU router learning: `recordCompletion()` already exists — wire it to the telemetry stream

Convergence target:
After 2,000 tasks (~2 weeks), the embedding index has enough density for 90

Final Compound Architecture

                        ┌──────────────────────────────────┐
                        │         TASK ENTRY                │
                        │  (Discord/CLI/Voice/Pane/Cron)    │
                        └──────────────┬───────────────────┘
                                       │
                          ┌────────────▼────────────┐
                          │  Complexity Quick-Check  │
                          │  (is this high-stakes?)  │
                          └──┬─────────────────┬────┘
                             │                 │
                        (normal)         (high-stakes)
                             │                 │
                    ┌────────▼──────┐  ┌───────▼────────┐
                    │  Twin Alpha   │  │  Agent Scoring  │
                    │  (default)    │  │  (genome+live   │
                    └───┬──────┬───┘  │   capacity)     │
                        │      │      └───────┬────────┘
                   (success) (fail)           │
                        │      │         Route to winner
                        ▼      │
                    Output  ┌──▼───────────┐
                            │ Smart Triage │
                            │ (Layer 1+2)  │
                            └──────┬───────┘
                                   │
                          Route to specialist
                                   │
                                   ▼
                    ┌──────────────────────────┐
                    │  Telemetry → Supabase    │
                    │  → Embedding Index       │
                    │  → Genome Evolution       │
                    │  → NUMU Learning          │
                    └──────────────────────────┘

Cost: $0-15/mo for twin serving (Together AI free tier) + $0 for Claude/Codex/Gemini (existing subscriptions) + ~$5/mo Supabase compute for pgvector index + existing infrastructure.

Engineering effort: ~2 weeks for the core (twin deployment + embedding index + telemetry). Genome evolution adds ~3 days. Agent status broadcasting adds ~2 days. Total: ~3 weeks to production.

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

evo-cube-output/twin-swarm-cognitive-routing/stage2-compound.md

Detected Structure

Method · Evaluation · Architecture · is Stage Research