Stage 0: RESEARCH — AutoMesh Self-Healing Architecture
``` [ROOT] AutoMesh Self-Healing ├── [D1] Code-Level Healing (self-healing-code/) │ ├── healer.py (1580 lines, Gen 6-7) │ │ ├── Wound/Antibody/ImmuneMemory (SQLite, reactive) │ │ ├── HealingStrategies (5: null_coalesce, type_coerce, key_fuzzy, index_bounds, retry_backoff) │ │ ├── CellularRegeneration (Gen 6, AST scan, 6 vuln patterns, fortify) │ │ └── WatchMode (Gen 7, file polling, VitalityTimeline, auto-fortify) │ └── KEY: Local-only. No cross-machine propagation. │ ├── [D1] Mesh Coordination (mesh-node-agent/) │
Full Public Reader
Stage 0: RESEARCH — AutoMesh Self-Healing Architecture
Exploration Tree (depth=3)
[ROOT] AutoMesh Self-Healing
├── [D1] Code-Level Healing (self-healing-code/)
│ ├── healer.py (1580 lines, Gen 6-7)
│ │ ├── Wound/Antibody/ImmuneMemory (SQLite, reactive)
│ │ ├── HealingStrategies (5: null_coalesce, type_coerce, key_fuzzy, index_bounds, retry_backoff)
│ │ ├── CellularRegeneration (Gen 6, AST scan, 6 vuln patterns, fortify)
│ │ └── WatchMode (Gen 7, file polling, VitalityTimeline, auto-fortify)
│ └── KEY: Local-only. No cross-machine propagation.
│
├── [D1] Mesh Coordination (mesh-node-agent/)
│ ├── mesh_node_agent.py (337 lines)
│ │ ├── TmuxInjector (list_panes, read_pane, inject, respawn_claude)
│ │ ├── MeshNodeAgent (heartbeat 30s, claim 15s, active_tasks dict)
│ │ ├── Supabase: mesh_pane_state, mesh_task_queue, mesh_task_events
│ │ └── Health endpoint :9300
│ └── reconciler.py (258 lines)
│ ├── 3 patterns: orphaned claims (10m), stuck injections (15m), task expiry (24h)
│ ├── Max retries: 3
│ └── ReconcilerRun event per 60s cycle
│
├── [D2] Evolution World Invariants (evolution_world/)
│ ├── invariants.py — 4 invariant checks + check_all_invariants() convenience
│ │ ├── check_min_entropy_production (KL divergence, epsilon=0.01)
│ │ │ └── CALC extension: cross-agent results boost novelty
│ │ ├── check_bounded_divergence (divergence_rate, max=2.0, window=5)
│ │ │ └── CALC extension: pending tasks add 0.5x pressure per dispatch
│ │ ├── check_cross_layer_forcing (L1 stall→L2 mutate, L2 stall→L1 force)
│ │ │ └── emergency_dual_forcing when both stall
│ │ └── check_no_absorbing_states (count_viable_transitions >= 2)
│ │
│ ├── metrics.py — Information-theoretic primitives
│ │ ├── kl_divergence(p, q) — Laplace-smoothed, log2 bits
│ │ ├── jaccard_distance(set_a, set_b) — Structural divergence
│ │ ├── state_novelty(prev, curr) — Weighted: structural(0.3) + milestone(0.5) + fitness(0.2)
│ │ ├── method_novelty(prev, curr) — KL on technique weights + strategy/decomposition change
│ │ ├── divergence_rate(states, window=5) — Average novelty per step
│ │ └── count_viable_transitions(genome, operators) — G/R/D viability check
│ │
│ ├── immune.py — 4-tier escalation defense system
│ │ ├── Tier 0: Innate (apply corrective, resume)
│ │ ├── Tier 1: Soft quarantine (weight -50%, 5 heartbeats)
│ │ ├── Tier 2: Hard quarantine (exclude 20 heartbeats, exploration burst)
│ │ ├── Tier 3: Recalibrate (mutate EWConfig thresholds, Discord alert)
│ │ └── Dead-end detection: min_entropy + no_absorbing co-violation
│ │
│ ├── forcing.py — CrossLayerForcer
│ │ ├── check_and_force(l1_steps, l2_steps) → forcing event
│ │ ├── _apply_forcing: l2_process_mutation / l1_force_phase_transition / emergency_dual
│ │ └── handle_dream_storm: Noosphere storm → force L2 exploration boost
│ │
│ └── feedback.py — 5-channel feedback metabolism
│ ├── Channels: build_health(0.25), crash_rate(0.20), engagement(0.25), store_presence(0.15), design_health(0.15)
│ ├── Temporal decay per channel (build: 1h, crash: 3d, engagement: 6h, store: 7d, design: 4h)
│ ├── compute_grounded_fitness() — weighted sum with decay
│ ├── compute_metabolism() — freshness(0.4) + diversity(0.4) + volume(0.2)
│ └── compute_momentum() — 14-day linear regression slope
│
├── [D2] Pane Orchestrator ([home-path])
│ ├── controller.py (~450 lines) — 5-phase cycle: SENSE→SELECT→MUTATE→CHECK→ADAPT
│ ├── Uses EW4 invariants for pane management
│ ├── Adaptive interval: MIN=30s, MAX=300s based on metabolism
│ └── backlog.json — task queue with priority/status/prompt
│
├── [D3] Hook Infrastructure ([home-path])
│ ├── orchestration/orchestration_detector.py — detects GSD/Evo3 completion
│ ├── orchestration/context_budget.py — tracks context window usage, handoff at 85%
│ ├── orchestration/handoff_writer.py — writes session-end handoff signals
│ ├── memory-guardian/guardian.py — prevents file shrinkage (invariance lock)
│ └── prompt-logger/ — session tracking, orbit sync, pane orchestrator loop
│
└── [D3] Adjacent Systems
├── KARL trajectory intelligence — reward engine (outcome/process/efficiency), shadow router
├── Cortex behavioral intelligence — skill forging, correction detection, rule promotion
├── Prefect flows — spawn_monitor, heartbeat_pulse, infra_watchdog
└── NUMU/Mesh Event Bus — transport layers for cross-machine eventsKey Findings
### 1. Two Immune Systems, Not Connected
- EW immune.py: 4-tier quarantine for evolution techniques (G/R/D), sliding window, exploration burst
- Healer ImmuneMemory: wound/antibody pattern matching for Python runtime errors
- Gap: These operate independently. EW immune handles evolution process health; healer handles code execution health. Neither covers infrastructure fault tolerance.
### 2. EW4 Invariants Already CALC-Aware
- `check_min_entropy_production` accepts `calc_results` param, boosts novelty from cross-agent discoveries
- `check_bounded_divergence` accepts `calc_pending_count`, factors pending dispatches into divergence pressure
- This means the invariant framework is designed for multi-agent coordination already
### 3. Feedback Metabolism = Heartbeat Speed Control
- `feedback.py` computes metabolism (0.0-1.0) from signal freshness/diversity/volume
- High metabolism → fast heartbeat → aggressive exploration
- Low metabolism → slow heartbeat → conservative refinement
- This is directly portable to mesh: machine metabolism from task success rate / pane utilization / failure diversity
### 4. Cross-Layer Forcing Has Dream Storm Hook
- `forcing.py::handle_dream_storm()` already handles external event injection
- This is the pattern for injecting mesh events: rate limit hits, cascading failures, machine dropouts
### 5. Missing Layer: Mesh-Level Immune System
The EW immune system (4-tier quarantine) operates on evolution techniques. Need an equivalent operating on:
- Machines (quarantine a flaky machine)
- Panes (quarantine a crashing pane)
- Task types (quarantine prompts that consistently fail)
- Accounts (quarantine rate-limited Claude accounts)
Constraints
1. Transport: Supabase is the shared state substrate. All mesh coordination goes through it.
2. Latency: Supabase roundtrip ~100ms. Heartbeat/claim cycles must tolerate this.
3. Single reconciler: Currently cloud-vm only. Mac1 could be backup but adds complexity.
4. Agent memory: mesh_node_agent has no persistent state. active_tasks dict lost on restart.
5. Pane output reading: tmux `capture-pane` is the only way to inspect agent behavior. Limited signal.
Open Questions for Stage 1
1. Should mesh immune tiers mirror EW tiers (0-3) or use different escalation?
2. How to detect "slow" panes vs "stuck" panes? (performance degradation before failure)
3. Should antibody propagation be push (on heal) or pull (on claim)?
4. How does KARL's reward engine integrate? (trajectory-based healing assessment)
5. What's the forcing equivalent for mesh? (L1=tasks, L2=machine allocation?)
6. Can WatchMode monitor mesh agent Python files for regressions?
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
evo-cube-output/automesh-self-healing/stage0-research.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture · is Stage Research