Grand Diomande Research · Full HTML Reader

Stage 0: RESEARCH -- Multi-Agent Survival Algorithms: Nash Equilibrium Escape

**Date**: 2026-03-10 **Source**: code4AI video rTwrFdyRmww (score: 8.0/10) **Target Systems**: Evolution World (primary), Multi-Agent Mesh (primary), KARL (secondary)

Agents That Account for Themselves proposal experiment writeup candidate score 22 .md

Full Public Reader

Stage 0: RESEARCH -- Multi-Agent Survival Algorithms: Nash Equilibrium Escape

Date: 2026-03-10
Source: code4AI video rTwrFdyRmww (score: 8.0/10)
Target Systems: Evolution World (primary), Multi-Agent Mesh (primary), KARL (secondary)

---

1. What Exists Today

1a. Evolution World (EW) -- The Existing Multi-Agent Evolution Engine

Codebase: `[home-path]` (~17 Python files, ~2,500 lines)

Architecture:
- L1 (App Evolution): Individual app genomes mutate via TIE techniques toward milestones
- L2 (Meta-Evolution): The evolution process itself adapts (technique weights, selection strategy, decomposition method, risk tolerance, exploration rate)
- L3 (Meta-Meta-Evolution): Evolves L2's constants every 30 L1 steps
- 4 Non-Halting Invariants:
1. Minimum Entropy Production: Every step must produce >= epsilon (0.01) bits of novelty (KL divergence). Prevents stalling.
2. Bounded Divergence: Divergence rate capped at M (2.0) over sliding window. Prevents chaos.
3. Cross-Layer Forcing: L1 stall (3 steps) forces L2 mutation. L2 stall (5 steps) forces L1 phase transition. Neither layer can stall independently.
4. No Absorbing States: Every configuration must have >= 2 viable transitions. Prevents dead ends.

Immune System: 4-tier escalation (Innate -> Soft Quarantine -> Hard Quarantine + Exploration Burst -> Threshold Recalibration). Sliding window of 10 heartbeats. Detects dead-ends via co-violation patterns (min_entropy + no_absorbing_states).

Population Genetics: Crossover operators (capability graft, architecture allele, operator transfer), species distance matrix (L2-evolved), within-species and between-species crossover with convergence gates.

Selection Strategies: tournament, roulette, rank_based, elitist, diversity_driven -- L2 mutates between these.

Key Fact: EW already has mechanisms to detect stalling (invariant 1), chaos (invariant 2), dead ends (invariant 4), and layer decoupling (invariant 3). But these operate at the INDIVIDUAL genome level and the L1/L2 coupling level. They do NOT detect SYSTEM-LEVEL equilibrium traps where the entire population converges to a suboptimal basin.

1b. Multi-Agent Mesh -- Pane Orchestration + Work Claiming

Pane Orchestrator: `[home-path]` (~900+ lines)
- Runs persistent loop checking pane states every 60s
- Detects: idle (>15 min), stuck (>30 min), plateau (same output hash for 6 cycles), context exhaustion (>4 hours)
- Injects context-aware prompts based on project state, GSD, Pulse anchors, git status
- Cooldown: 10 min between injections per pane
- Protected panes system (manual lock files)

Mesh Coordination: 4 Supabase tables
- `machine_work_queue`: Centralized task queue with domain routing + complexity scoring
- `work_claims`: Distributed mutex (UNIQUE(task_id) prevents double-claiming)
- `machine_context_cache`: Shared state
- `machine_domains`: Machine identity, specializations, rate limit status

Domain Classification: Regex patterns classify CWD into: ios, infra, creative, systems, general. Confidence threshold 0.6.

Key Fact: The work-claiming system is a FIRST-COME-FIRST-SERVED mutex. There is NO strategic behavior in how tasks are claimed. Panes do not reason about what OTHER panes are doing or will do. This is precisely the gap where equilibrium traps can form -- panes converge on familiar task types and avoid unfamiliar ones without any mechanism to detect or correct the pattern.

1c. KARL -- Trajectory-Based Learning

Codebase: `[home-path]` (8 Python files, 67+ trajectories)
- Trajectory Tap: 4 tap points (A: session init, B: tool events, C: session flush, D: cross-turn annotation)
- Reward Engine: 3-signal composite (outcome 0.40, process 0.35, efficiency 0.25)
- Weight Updater: EMA updates to skill embeddings from reward data. Bounds [0.5, 1.5].
- Embedding Cache: 13 skill embeddings (3072-dim)

Key Fact: KARL records trajectories and updates skill weights, but it does NOT have a world model. It cannot predict "what would happen if pane X changed strategy from task-type-A to task-type-B." It is purely reactive -- it learns from past trajectories but cannot reason counterfactually about future strategies.

1d. NUMU FARE Bus

WebSocket bus at :7890, HTTP at :8500
16 packages, 8.1K lines TS
Fire-and-forget event distribution
Already carries mesh events, pane state changes, health signals

---

2. Academic Ground Truth: Nash Equilibrium in Multi-Agent Systems

2a. Nash Equilibrium Traps

A Nash equilibrium is a state where no individual agent can improve its outcome by unilaterally changing its strategy. In cooperative multi-agent settings, this becomes problematic when the equilibrium is suboptimal -- all agents are "rationally" stuck even though a coordinated strategy shift would benefit everyone.

The trap in our system: If 16 panes each develop preferences for certain task types (e.g., ios-build, infra-deploy) based on past success, they reach a Nash equilibrium where no single pane benefits from switching. But the SYSTEM would benefit if some panes diversified into under-served task types (e.g., testing, documentation, refactoring).

2b. Performative Prediction (Perdomo et al., 2020; Multiplayer extensions 2022-2025)

Core concept: In performative prediction, the act of making a prediction changes the distribution being predicted. Applied to multi-agent settings:

Each agent's model deployment affects other agents' data distributions
Standard equilibrium concepts (Nash, performatively stable) may not coincide
Competition can cause phase transitions from stability to instability to chaos
Online Performative Gradient Descent (OPGD) learns Nash equilibria in decision-dependent games

Application to our system: When a pane claims a task, it changes the distribution of available tasks for all other panes. The pane's "prediction" of which task is best for it is performative -- it shapes the task landscape. If panes use stale models of the task distribution, they converge on suboptimal equilibria.

2c. Subjective Embedded Equilibrium

From the video transcript analysis: each agent has a different view of the equilibrium state. In our mesh:
- Each pane sees only its own project context, GSD state, and recent commits
- No pane has a global view of the task distribution across ALL panes
- Each pane's "equilibrium" is subjective -- it thinks it is making the best choice, but it is making the best choice given INCOMPLETE information about what other panes are doing

2d. Counterfactual World Models via Sequence Models

Concept: Agents use sequence models (transformers trained on action-outcome trajectories) to predict "what would happen if I changed strategy." This enables:
- Counterfactual reasoning: "If I switched from ios-build to testing, what would my reward be?"
- Strategy exploration without risk: Test strategy changes in simulation before committing
- Coordination: If each agent can predict how others would respond to its strategy change, they can find Pareto-improving deviations from Nash equilibria

Application: KARL already records trajectories. A sequence model trained on KARL trajectories could predict outcomes of strategy changes. But KARL currently has no such model.

2e. Trust Mechanics Between Agents

From the literature: Formalized trust computation for cooperative scenarios involves:
- Reliability trust: Has this agent delivered on past commitments? (KARL has this via reward scores)
- Competence trust: Is this agent good at THIS type of task? (KARL has skill embeddings for this)
- Intention trust: Does this agent cooperate or defect? (NOT measured in our system)

Application: When panes claim work, there is no trust model. A pane that consistently fails at certain task types is not penalized or redirected. The immune system operates at the EW level (app genomes), not at the mesh level (pane behavior).

---

3. Real Constraints

### Technical Constraints
- 16-40 concurrent panes across 3 Macs with 3 Claude Max accounts
- NUMU bus can carry additional event types without schema change
- Supabase already has mesh coordination tables; adding columns or tables is cheap
- KARL trajectories.jsonl is append-only, 67+ records; enough for a small world model but not enough for a large sequence model
- EW invariants are well-tested but operate per-genome, not per-population

### Resource Constraints
- Claude Max subscriptions are rate-limited; cannot probe them via API
- Each pane runs independently; no shared memory between panes
- Pane orchestrator injects prompts via AppleScript; latency ~5-15s per injection
- EW runs as a daemon with heartbeat loop; adding new invariants is straightforward

### Human Constraints
- Mohamed is the only human operator
- System must be autonomous once deployed
- False positives in equilibrium detection are costly (unnecessary disruption)
- Must be observable via Nexus Portal

---

4. What Has Been Tried

### EW's Existing Anti-Stall Mechanisms (Working)
- Cross-layer forcing: Detects L1/L2 stalls, forces mutations. Works for individual genomes.
- Immune system quarantine: Escalates from soft to hard quarantine. Works for technique-level traps.
- Exploration burst: Injects 3 diverse techniques when a technique is quarantined. Works for local exploration.

### What Has NOT Been Tried
- Population-level equilibrium detection: No mechanism measures whether the POPULATION of genomes has converged to a suboptimal basin
- Pane behavior modeling: No system tracks which panes claim which task types over time
- Counterfactual reasoning: No world model predicts outcomes of strategy changes
- Trust-weighted task routing: No system routes tasks based on pane competence
- Performative awareness: No system accounts for the fact that claiming a task changes the distribution for everyone

### CALC (Cross-Agent Collaboration)
- Already extended invariants with CALC-awareness (calc_results boost entropy, calc_pending increases divergence pressure)
- This proves the invariant framework CAN be extended with cross-agent signals
- But CALC measures "did cross-agent work produce novelty" -- it does NOT measure "are agents stuck in suboptimal coordination patterns"

---

5. Open Questions for Stage 1

1. Where does equilibrium detection live? In EW (as a 5th invariant)? In the pane orchestrator? In a new module? Or distributed across all three?

2. What is the right granularity? Should we detect equilibrium at the pane level (work claiming), the genome level (EW population), or the task-type level (category distribution)?

3. How much history is needed? The sliding window for immune system is 10 heartbeats. What window size detects equilibrium traps without false positives?

4. Can we build a world model from 67 KARL trajectories? Or do we need to accumulate more data first? Is a simple statistical model (transition matrix) sufficient vs. a neural sequence model?

5. What is the cost of false positive equilibrium detection? Disrupting a productive equilibrium (where all panes are efficiently working on the right things) is worse than tolerating a suboptimal one.

6. How does this interact with the 4 existing invariants? Does equilibrium escape conflict with bounded divergence? (Escape requires increasing divergence; bounded divergence caps it.)

7. Should trust be explicit or implicit? Explicit trust scores are observable but require maintenance. Implicit trust (derived from KARL rewards) is cheaper but harder to debug.

---

6. Constraints That MUST Carry Forward to Every Path

C1: Must not break the 4 existing non-halting invariants
C2: Must be observable via Nexus Portal and/or Prometheus metrics
C3: Must work with current pane infrastructure (AppleScript injection, 60s loop)
C4: Must survive daemon restarts (Supabase persistence)
C5: Must not require neural model training as a prerequisite (can be a future enhancement, but the base system must work with statistical methods)
C6: False positive rate for equilibrium detection must be < 10
C7: Equilibrium escape actions must respect bounded divergence -- escape force is channeled, not chaotic
C8: Must integrate with KARL trajectory data (read trajectory history for behavior modeling)
C9: Must emit NUMU events for mesh-wide visibility
C10: Implementation must be < 1,500 new lines across all affected files (keep it tractable)

---

Research Sources

[Multi-agent Performative Prediction: From Global Stability and Optimality to Chaos](https://dl.acm.org/doi/10.1145/3580507.3597759) -- ACM EC 2023
[Multiplayer Performative Prediction: Learning in Decision-Dependent Games](https://www.jmlr.org/papers/volume24/22-0131/22-0131.pdf) -- JMLR 2024
[Online Performative Gradient Descent for Learning Nash Equilibria](https://openreview.net/forum?id=IdF7VT6eEs) -- OpenReview
[Performative Prediction on Games and Mechanism Design](https://arxiv.org/html/2408.05146v1) -- arXiv 2024
[Game Theory and Multi-Agent Reinforcement Learning: From Nash Equilibria to Evolutionary Dynamics](https://arxiv.org/html/2412.20523v1) -- arXiv 2024
[Advanced Game-Theoretic Frameworks for Multi-Agent AI](https://arxiv.org/pdf/2506.17348) -- arXiv 2025
[General Agents Contain World Models](https://arxiv.org/pdf/2506.01622) -- arXiv 2025
[Paths to Equilibrium in Games](https://proceedings.neurips.cc/paper_files/paper/2024/file/b6e271e596574f2b2dfadec6b3ba22a4-Paper-Conference.pdf) -- NeurIPS 2024

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

evo-cube-output/multi-agent-survival-algorithms/stage0-research.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture · is Stage Research