Grand Diomande Research · Full HTML Reader

Layer 4 — Evo3 Architectural Exploration of the SOOP-2 Convergence Problem

> **Status:** Scrutiny layer 4 of 4. Peer architectures vs ELP-1, not critique of ELP-1. > **Date:** 2026-05-13 > **Inputs:** ELP-1 v1 draft (05-everlasting-loop-protocol.md), 10 SOOP-2 criteria, current scoreboard (14/295 typed, 3/10 criteria met). > **Output role:** Compete with ELP-1 as a peer design. Verdict at the end.

Agents That Account for Themselves architecture technical paper candidate score 44 .md

Full Public Reader

Layer 4 — Evo3 Architectural Exploration of the SOOP-2 Convergence Problem

> Status: Scrutiny layer 4 of 4. Peer architectures vs ELP-1, not critique of ELP-1.
> Date: 2026-05-13
> Inputs: ELP-1 v1 draft (05-everlasting-loop-protocol.md), 10 SOOP-2 criteria, current scoreboard (14/295 typed, 3/10 criteria met).
> Output role: Compete with ELP-1 as a peer design. Verdict at the end.

---

Stage 0 — RESEARCH (light, depth=2)

The honest bottleneck of SOOP-2 right now is not compute, not orchestration, not multi-machine resilience. It is the body of work itself: 281 more SKILL.md files need typed frontmatter (Track B), one router needs a labeled benchmark plus a `type_compatibility_weight` knob (Track C), one Tier 2 endpoint needs a flip from MiniMax to Mac4:8100 with fallback (Track D), four feedback components need wiring at `[home-path]` (Track E), and two more skills need silent_capable=true (Track G). Total estimated effort from EXECUTION-PLAN.md is 13.5 days, mostly mechanical edits that any Claude or Codex pane can do.

The reason ELP-1 even exists is to defend against the four caveats in §0 of that doc: opaque ScheduleWakeup queue, Claude Code closure killing wake, single-driver pattern, no external stall escalation. Each caveat is real. None of them is currently causing a stall — the loop is advancing, just slowly and only when Mohamed's main session is alive. The cost of waiting (3 single-Claude sessions over 2 weeks) is comparable to the cost of building ELP-1 (1-2 sessions, per the doc's claim; realistically 2-4 once Supabase migrations, launchd plists on two machines, worker registration, and verifier scripts are all shipped and debugged).

Mesh primitives genuinely shipped today: `aura-gateway` (mac1:8095) with cross-machine `/inject`, `codex-gateway` (mac4:8096), Syncthing across mesh, Supabase with several active projects, `pulse` skill (Pulse-eligible tracks live in EXECUTION-PLAN), `ops:autopilot` (referenced in the SOOP-2 dispatch table), `meshd` running but with Codex slot "unborn" per mac4-codex-control-audit-2026-05-06.md, telegram + sms bridges (per ELP-1 §8.2 calling them "already shipped per memory"), Grafana on cloud-vm. Hypothetical or not-fully-wired: cortex:rules quarantine integration, the SOOP-2 verifier suite (the 10 per-criterion checks don't exist as code yet), worker heartbeat tables in Supabase, the supervisor.py and worker.py themselves.

The genuine question Evo3 must answer: does the work scale better if you build a multi-driver platform first, or if you just keep typing skills with a slightly hardened single-driver? ELP-1 assumes platform-first. The 6 paths below test that assumption.

---

Stage 1 — EXPLORE: 6 divergent architectures

Path A — Stateless Dispatch (Family 1)

Core mechanism. No persistent queue. Every supervisor cycle regenerates work from scratch by reading the linter output and the SKILL.md files directly. The "state" is the filesystem of skills. A supervisor cycle does: (1) run linter, (2) scan all 295 SKILL.md files for missing types, (3) pick the next N untyped skills by lexicographic order or category, (4) inject one prompt per skill to the next available worker, (5) wait for the linter to flip the typed_count up, (6) sleep 5min and repeat. No Supabase, no `soop2_queue` table, no claim semantics, no batch attempt counters.

Mesh layout. One launchd plist on mac1 runs `supervisor.py`. It reads the skill directory locally (Syncthing mirrors it to all mesh nodes, so cloud-vm and mac4 see the same files). Workers are just Claude/Codex panes; the supervisor `/inject`s "type these 5 skills" prompts via aura-gateway. No worker registration, no worker DB row.

State model. The skill files ARE the state. `[home-path]` is the source of truth. Filesystem mtime + parsed frontmatter is the scoreboard. Nothing to lose because nothing is stored.

Failure recovery. Anything dies, next cycle starts fresh. A worker that hangs mid-skill: the next cycle re-scans and re-dispatches that skill (which is idempotent — typed frontmatter only gets written once because the linter detects it already typed). Supabase down: doesn't care, doesn't use it. Mac1 down: install the plist on mac2 with one ssh + cp + launchctl load.

Build cost. ~4 hours. One supervisor.py (~200 lines), one launchd plist, one aura-gateway integration. No DB schema. No worker registration. No verifier daemon (just run the linter inline).

Strengths vs ELP-1. (1) Zero state-migration risk — nothing to migrate. (2) Idempotent by construction — re-running the supervisor never corrupts anything because the filesystem is canonical.

Weaknesses vs ELP-1. (1) No retry tracking — if a skill keeps failing the linter, we'll keep dispatching it forever instead of quarantining. Need a small `failures.json` file. (2) No visibility into what's "in flight" right now — supervisor regenerates blindly each cycle, may dispatch the same skill twice if a worker is mid-edit. Mitigated by a 1-minute file-lock convention (`mtime < 60s = "in flight"`).

---

Path B — Human-in-Loop Dashboard (Family 2)

Core mechanism. Mohamed is the supervisor. A static HTML dashboard at `Desktop/soop-2-dashboard.html` (Syncthing-shared, opens on any device) shows the scoreboard, next-up batches, per-worker status, and big buttons for `Dispatch B.1 batch to mac1`, `Dispatch B.2 batch to mac4`, `Run audit`, `Wire reaction_logger`. Each button fires a curl to aura-gateway with a pre-built prompt. No launchd, no autonomous cycle. The "loop" is Mohamed clicking buttons whenever he's at a device.

Mesh layout. Dashboard generated by a small cron job on mac1 (every 60s, rebuilds HTML from current state). Buttons POST to aura-gateway. Aura-gateway routes to the chosen mesh node. Phone, iPad, laptop all open the same file.

State model. Scoreboard JSON in `[home-path]`. Worker prompts in a `prompts/` directory. Dashboard reads these and renders. No DB.

Failure recovery. Everything dies, Mohamed re-opens the dashboard and re-clicks. The "loop" was never autonomous so there's nothing to recover. The work that finished is durable in the SKILL.md files.

Build cost. ~3 hours. One HTML template (Jinja2 + alpine.js), one regen cron, one prompts/ directory. Reuses aura-gateway entirely.

Strengths vs ELP-1. (1) Zero stall risk — there's no autonomous component that can silently stop, because there's no autonomous component. Stalls are visible as "Mohamed didn't click for 6 hours" which is fine. (2) Excellent for the messy non-mechanical tracks (E.1-E.4 feedback wiring, C.3 labeled benchmark generation) where Mohamed wants to inspect each step.

Weaknesses vs ELP-1. (1) Loses the "operates while Mohamed sleeps" property entirely. SOOP-2 takes 13.5 days of Mohamed waking hours instead of 3-4 days of mesh-time. (2) High click-fatigue — 281 skills means 281 button presses if batches are size 1. With batches of 30, still 10 click sessions, but each requires Mohamed to babysit until it returns.

---

Path C — Pulse-as-Engine (Family 3)

Core mechanism. The `pulse` skill already exists and can spawn agent sessions. EXECUTION-PLAN.md flags 20+ tracks as Pulse-eligible. Stop building a parallel system; make Pulse the engine. Wrap each SOOP-2 track as a Pulse spawn invocation with a deterministic checklist. A small `soop2_spawner.py` cron job (mac1, every 10min) reads `soop2_spawn_queue.yaml` (Mohamed-editable), pops the next entry, and calls `pulse spawn` with the matching prompt template. Pulse handles all the agent lifecycle, rate-limit awareness, and session telemetry it was built for.

Mesh layout. mac1 runs the spawner cron. Pulse spawns agents on whichever mesh node Pulse already decides (Pulse has its own routing). Aura-gateway is untouched. Codex-gateway is untouched.

State model. `[home-path]` (priority-ordered list). `[home-path]` (append-only, what Pulse returned). Pulse's own state tables in Supabase already exist; SOOP-2 just becomes another consumer.

Failure recovery. If Pulse is healthy, SOOP-2 is healthy. If Pulse breaks, Mohamed already knows about it because Pulse is critical to many other workflows. No SOOP-2-specific failure mode to design around.

Build cost. ~2 hours. One spawner cron (~80 lines), one YAML queue file, ~20 prompt templates (one per track sub-step). No new schema, no new launchd plist beyond the cron.

Strengths vs ELP-1. (1) Massive code reuse — Pulse already solves worker spawn, rate-limit awareness, telemetry, and session boundaries. ELP-1 reimplements all four. (2) Lower maintenance — when Pulse improves, SOOP-2 inherits the improvement automatically. ELP-1 maintains its own worker model forever.

Weaknesses vs ELP-1. (1) Couples SOOP-2 to Pulse's reliability. If Pulse has a bug, SOOP-2 inherits it. ELP-1's bespoke worker model is at least debuggable in isolation. (2) Pulse's session model is one-shot per spawn — long-horizon SOOP-2 batches (e.g., type 30 skills in one session) may not map cleanly. Need to verify Pulse's session-length envelope before betting on this.

---

Path D — Single launchd, Single Mac (Family 4)

Core mechanism. Forget the mesh. SOOP-2's work is 95

Mesh layout. Mac1 only. Other machines untouched. Aura-gateway not used (the script just talks to tmux directly on mac1).

State model. `[home-path]` (single file, atomic write). `[home-path]` (which skills were attempted in the last cycle). No Supabase.

Failure recovery. Mac1 dies: the work stops (everything was on mac1). When mac1 comes back, launchd auto-restarts the plist. Claude Code rate-limited: script detects the rate-limit pattern in tmux output, sleeps 30min. Linter fails to advance: script logs the failed cycle, picks a different batch next time.

Build cost. ~5 hours. One Python script (~250 lines), one launchd plist, tmux automation. Slightly more code than Path A because it has retry/rate-limit logic that Path A skips.

Strengths vs ELP-1. (1) Single point of complexity — everything lives on mac1, no mesh-coordination bugs possible. (2) Honest about what the work actually is — Claude editing SKILL.md files locally — and refuses to invent distribution where there's no distribution payoff.

Weaknesses vs ELP-1. (1) Single point of failure on mac1. If mac1 disk dies, no progress until restored. ELP-1 has Mac2 hot-standby. (2) Can't use mac4 Codex sessions in parallel even when they'd be useful (e.g., 30 mass-typing batches running across mac1-Claude + mac4-Codex + cloud-vm-Claude in parallel could finish Track B in 1 day instead of 4).

---

Path E — Event-Driven Supabase Realtime (Family 5)

Core mechanism. Instead of a polling supervisor, use Supabase's realtime subscriptions as the dispatch trigger. When a skill flips from typed=false to typed=true, that triggers a postgres function which writes a new row to `soop2_dispatch_queue` for the next skill. Workers subscribe to that table via supabase-py realtime. When a new row appears, the first idle worker grabs it (Supabase `for update skip locked` pattern), processes, marks done, which triggers the next dispatch via the same DB function. No supervisor at all. The cycle is event-driven, not time-driven.

Mesh layout. Each mesh node runs a tiny `soop2_listener.py` that maintains a Supabase realtime subscription. When triggered, it injects a prompt into its local Claude/Codex pane via tmux. mac1's aura-gateway is bypassed (each node listens directly).

State model. Supabase tables: `soop2_skills` (mirror of the SKILL.md typed status, updated by a per-cycle linter-runner cron), `soop2_dispatch_queue` (next-up work, generated by postgres functions on skill-flipped events), `soop2_workers` (heartbeats). Postgres functions encode the business logic ("when typed count crosses 95

Failure recovery. Supabase has 99.9

Build cost. ~7 hours. Postgres functions are tricky to get right (1-2 hours each, need 4-5). Realtime subscription wiring in Python (~1 hour). Per-node listener daemon (~2 hours). Linter cron (~1 hour).

Strengths vs ELP-1. (1) Reactive, not polling — dispatches happen the millisecond a skill flips, not on the next 5-min boundary. Better latency. (2) Encodes the criteria-passing logic in SQL (postgres functions), which is easier to reason about and audit than the Python-spread-across-5-files alternative in ELP-1.

Weaknesses vs ELP-1. (1) Hard dependency on Supabase realtime, which is a Supabase-specific feature; ELP-1's filesystem fallback is genuinely portable. (2) Postgres functions are write-once-debug-hard — when one misfires, finding the bug across SQL + python listeners is harder than reading a 300-line Python supervisor.

---

Path F — Codex-Led Supervisor (Family 6)

Core mechanism. Mac4 Codex.app already runs long sessions with its own native queue (mac4-codex-gateway-shipped-2026-05-06.md notes "Codex's native queue accepts prompts even mid-response"). Promote Codex from "one of many workers" to "the supervisor". Mohamed's Claude sessions become workers when alive. Mac4 Codex runs continuously with a `codex-soop2-supervisor` prompt that re-reads the scoreboard every N minutes, decides the next batch, and either does the work itself in Codex or `/inject`s a prompt to mac1's Claude pane via aura-gateway. Codex's session persistence is the durability mechanism — no launchd needed.

Mesh layout. mac4 Codex is the brain. mac1 Claude is the muscle for skills that benefit from Claude's larger context. cloud-vm is unused unless Codex decides to delegate there. aura-gateway routes Codex's outbound `/inject`s.

State model. Codex's own conversation memory holds the supervisor state. A small `[home-path]` file is the projection Codex reads each cycle. Supabase optional (for cross-session continuity if Codex restarts).

Failure recovery. Codex restarts: re-read scoreboard.json, resume. Mac4 reboots: codex-gateway plist relaunches Codex; Codex reads its checkpoint. mac4 dies hard: Mohamed promotes a Claude session manually to supervisor by reading the scoreboard and continuing. The fallback is "Mohamed becomes Codex" which is fine.

Build cost. ~3 hours. One supervisor-prompt template (~100 lines markdown), one cron job to re-read scoreboard and notify Codex (or rely on Codex to ask), aura-gateway re-use.

Strengths vs ELP-1. (1) Uses a session that's already designed for long-horizon — Codex.app's session model is 24-hour-plus with native queueing. ELP-1 fights against Claude Code's session-closure problem by adding launchd; Path F sidesteps the problem by using Codex which doesn't have it. (2) Lower code volume — most of the "supervisor logic" lives in a prompt, not a Python script. Easier to iterate on.

Weaknesses vs ELP-1. (1) Hard dependency on Codex's session staying alive — if Codex.app crashes and codex-gateway doesn't auto-recover, the loop stalls. ELP-1's launchd plist is more deterministic. (2) Codex's behavior is non-deterministic in a way that a Python supervisor isn't. Hard to assert "supervisor will do X exactly every 5 minutes" when X is a prompt to an LLM.

---

Stage 2 — COMPOUND (sequential synthesis)

Transition A → A+B

Start with Path A (stateless dispatch, filesystem-canonical). It's the simplest base and survives almost everything. Add Path B (human-in-loop dashboard) not as a replacement but as the observability and intervention surface that ELP-1 puts in Layer 4. The dashboard reads the same scoreboard.json that Path A's supervisor reads/writes, plus emits buttons for manual override (`Force-dispatch this batch`, `Quarantine this skill`, `Pause loop`).

Compatible: Filesystem state is shared. Dashboard is read-only from the supervisor's perspective; supervisor is read-only from the dashboard's perspective except for explicit override files.
Incompatible: Nothing. They share state but don't compete.
Emergent: Dashboard is now the "Layer 4 observability" of ELP-1 without needing Grafana on cloud-vm. Mohamed has the supervisor running autonomously AND a clear visual + override surface. Best of both: autonomy + human control.
Lost: Nothing meaningful. Slight extra cron job (dashboard regen) added.

Transition A+B → A+B+C

Layer Path C (Pulse-as-engine) onto A. Replace the inline `aura-gateway /inject` with `pulse spawn` calls in the supervisor. Now the supervisor's job is reduced to: read scoreboard, decide next batch, call `pulse spawn` with the right prompt, log the spawn_id. Pulse handles rate-limit, routing, session lifecycle.

Compatible: Path A's stateless model + Pulse's spawn model are both stateless dispatch patterns. They click together cleanly.
Incompatible: Direct aura-gateway control loses some precision over which mesh node runs which batch. We accept Pulse's routing decision.
Emergent: Less code to maintain. ELP-1's Section 5 (worker dispatch, ~50 lines of supervisor + worker registration tables) collapses into "pulse spawn with prompt".
Lost: Some control over worker assignment. If mac4 Codex is genuinely better at one batch kind than mac1 Claude, Pulse may not know that. Mitigation: encode hints in the spawn prompt (`prefer: mac4-codex`).

Transition A+B+C → A+B+C+D

Path D (single-launchd-on-mac1) suggests we don't need multi-machine for this work. Re-evaluate. Is the work actually parallelizable across mesh nodes, or is it sequential filesystem editing? Track B (mass typing) IS parallelizable — different skills are independent. Tracks E.1-E.4 (feedback wiring) are sequential on a single codebase. Tracks C/D (router work) involve specific endpoint configs.

The right compromise: default single-machine, escalate to mesh only for parallelizable tracks. The supervisor knows which tracks are parallelizable (a flag in the prompt template). For Track B, it dispatches 3 parallel spawns (mac1-Claude + mac4-Codex + cloud-vm-Claude). For Track E, one spawn on mac1 only.

Compatible: Yes. Single-machine is just a degenerate case of "parallelism=1".
Incompatible: Path D's claim that "no mesh ever" is too strong. Reject the absolute form, keep the principle.
Emergent: Concurrency knob per batch kind. No worker registration table needed (Pulse handles it).
Lost: Pure Path D's radical simplicity. We accept some mesh complexity in exchange for Track B speedup.

Transition A+B+C+D → A+B+C+D+E

Path E (Supabase realtime) offers reactivity. The current synthesis polls every 5min. Is reactivity worth the postgres-function complexity?

Honest assessment: for this workload, no. The linter takes 0.13s. A 5-min poll has 5min latency. A realtime trigger has 1s latency. Difference: 4min 59s. SOOP-2 has 13 days of work. The latency saving is 1 part in 4000. Not worth the postgres function complexity.

But: Path E's "encode criteria-passing logic as code" is good. Pull that out without the realtime infrastructure. Instead of postgres functions, write `verifier.py` (which ELP-1 already has in §7.2) and keep it as Python. The 10 per-criterion checks live there, idempotent, fast, easy to debug.

Compatible: Verifier as a separate concern is compatible with everything.
Incompatible: Supabase realtime infrastructure not pulled in.
Emergent: Clean separation: supervisor decides work, dispatcher fires it, verifier flips criteria. Three single-purpose components.
Lost: Reactivity. Accepted tradeoff.

Transition A+B+C+D+E → A+B+C+D+E+F

Path F (Codex-led) suggests Codex's long session is a better supervisor host than launchd. Worth evaluating.

Honest assessment: Codex is great for the supervisor role on the days mac4 is alive and Codex isn't rate-limited. But a launchd plist is deterministic and free. The right move: dual supervisors with a precedence rule.

Primary: a small launchd plist on mac1 runs the supervisor every 5min. Idempotent — if Codex already did the cycle in the last 5min, this run is a no-op.
Augment: a Codex session at mac4 runs a "long-horizon advisor" prompt. It reads the scoreboard every hour, suggests prompt-template improvements, generates next-batch tactical refinements, and writes them to `[home-path]`. The launchd supervisor reads these hints when generating prompts.

This gives ELP-1's launchd determinism and Path F's "use the longest-context supervisor we have" advantage, without making either critical.

Compatible: Yes. Codex as advisor, not as supervisor.
Incompatible: Codex-as-sole-supervisor.
Emergent: Two minds — Codex thinks strategically, launchd thinks tactically. Codex can suggest "skip skill X for now, its preconditions aren't met"; launchd executes.
Lost: Some elegance. The system has 2 components instead of 1. But neither component is critical.

Final Compound Architecture — "ELP-Lite + Codex Advisor"

Components (4 total, named):

1. Supervisor (launchd, mac1, every 5min, ~200 lines Python).
- Reads scoreboard.json from filesystem.
- Runs linter.
- Generates next batch from a small priority table + Codex hints.
- Calls `pulse spawn` to dispatch (no bespoke worker DB).
- Writes log entry, writes updated scoreboard.json.
- Heartbeat file.

2. Verifier (launchd, mac1, every 5min offset 150s, ~100 lines Python).
- Reads scoreboard.json.
- Runs the 10 per-criterion checks.
- Flips `criteria_passed` in scoreboard.json on changes.
- Writes verifier log entry.

3. Dashboard (cron, mac1, every 60s, ~150 lines, regenerates HTML).
- Reads scoreboard.json + log files.
- Writes `Desktop/soop-2-dashboard.html` (Syncthing-shared).
- Has buttons that POST to a tiny `dashboard_handler.py` (~50 lines) for manual overrides.

4. Codex Advisor (Codex.app on mac4, persistent session, prompt-driven).
- Re-reads scoreboard every hour (or when Mohamed pings).
- Writes `codex_hints.json` with strategic suggestions.
- Not load-bearing — supervisor works fine without hints.

State plane. Single file: `[home-path]`. Append-only log: `[home-path]`. Syncthing replicates both. No Supabase. Supabase optional in v1.1 for cross-mesh durability if filesystem mirror proves insufficient.

Failure recovery.
- Claude Code closed: supervisor is outside Claude. No effect.
- Mac1 down: install plist on mac2 with one command. Or, manually run the supervisor on cloud-vm. Or, leave it — the work resumes when mac1 returns.
- Pulse broken: supervisor logs the spawn failure, dashboard shows it, Mohamed clicks "fallback to direct aura-gateway" button. Two-line code path.
- Codex offline: hints stale, supervisor uses defaults. No-op.
- Linter regression: supervisor detects exit-code-flipping, dashboard banner, Mohamed intervenes.

ELP-Lite vs ELP-1 on 5 axes

Axis	ELP-1	ELP-Lite	Winner
Build cost	1-2 sessions claimed, realistically 3-4 (Supabase migrations, two-machine launchd, worker registration, verifier, dashboard, Grafana, telegram wiring)	1 session for v1 (4 components × ~100-200 lines each, all single-machine, all reusing aura-gateway+Pulse)	ELP-Lite
Failure-mode coverage	Excellent (mac2 hot-standby, Supabase + filesystem dual write, claim TTL, rate-limit awareness, quarantine)	Good (single-machine but launchd auto-restarts; manual mac2 promotion via one command)	ELP-1 narrowly
Operational complexity	4 tables, 2 plists, 2 cron jobs, Grafana panels, worker registration, claim semantics, observer-mode logic, Syncthing reconciler	1 JSON file, 2 plists, 1 cron, Codex prompt, optional Supabase later	ELP-Lite
Mesh integration depth	Deep — every mesh node potentially a worker, every state write replicated	Moderate — mac1 primary, Pulse spawns to mesh as needed, scoreboard Syncthing-replicated	Tie (ELP-1 is deeper but ELP-Lite is enough)
Maintainability at 6 months	Risk of stale worker rows, Supabase schema drift, claim TTL tuning forever	One JSON file, four small scripts. Anyone can read it in 30 minutes.	ELP-Lite

ELP-Lite wins 3, ties 1, loses 1.

---

Stage 3 — EXPAND + MASTER PLAN

Stress tests (5 failure scenarios)

1. Mac1 dies (hardware failure).
- ELP-1: mac2 observer promotes itself after 30min (heartbeat goes stale). Supabase + filesystem reconcile. Workers on mac4/cloud-vm continue.
- ELP-Lite: launchd plist on mac1 doesn't fire. Mohamed runs one command on mac2: `cp [home-path] [home-path] && launchctl load ...`. Scoreboard is Syncthing-mirrored, so mac2 starts from the same state. Manual but ~60 seconds. Pulse is unaffected (cloud service).
- Verdict: ELP-1's automatic failover is nicer but requires that Mohamed actually trust the observer-mode logic. ELP-Lite's manual failover is dumber but provably correct.

2. Supabase 503 for an hour.
- ELP-1: filesystem fallback kicks in. On heal, reconciler runs. The system continues but workers can't write atomic claims through Supabase, only filesystem.
- ELP-Lite: doesn't notice. Doesn't use Supabase in v1.
- Verdict: ELP-Lite trivially wins this. It's a non-event.

3. Mohamed `rm -rf [home-path]` by accident.
- ELP-1: Supabase still has the truth. Filesystem reconciles from Supabase. Work continues.
- ELP-Lite: Syncthing replicates the deletion to all mesh nodes (this is bad). Mitigation: a hourly snapshot of scoreboard.json to `[home-path]` (Syncthing-excluded local-only directory). Restoring is `cp snapshots/latest.json scoreboard.json`.
- Verdict: ELP-1 wins. Add snapshots to ELP-Lite to close the gap.

4. Tailscale partition (mac1 isolated from cloud-vm + mac4).
- ELP-1: mac1's supervisor still runs, dispatches to local panes only (mac1-claude-cli). Other workers go silent until heal. Supabase reads through Tailscale exit-node fallback or fail to filesystem.
- ELP-Lite: mac1's supervisor still runs. Pulse spawn calls may fail (Pulse routes through aura-gateway which is local on mac1, but spawned sessions on mac4/cloud-vm fail to inject). Supervisor falls back to dispatching mac1-local sessions only.
- Verdict: Tie. Both degrade to "mac1 does the work it can do locally."

5. Model rate-limited everywhere simultaneously (the "5h cap fires for everyone" scenario).
- ELP-1: workers respect `soop2_workers.rate_limited_until`. Supervisor stops dispatching. Resumes when cap clears.
- ELP-Lite: Pulse already handles rate-limit awareness per session (per Pulse spec). Spawn calls return rate-limit status. Supervisor backs off.
- Verdict: Tie. Both handle this via their respective mechanisms.

Top 3 risks for ELP-Lite

Risk 1: Pulse session-length envelope doesn't cover long batches.
- Probability: Medium.
- Impact: High — if Pulse spawns can only handle <30min, then Track B mass typing (which Mohamed might want as one 2-hour spawn) needs splitting.
- Mitigation: Before committing to Path C reuse, verify Pulse's session model on the largest planned batch. If it doesn't fit, fall back to direct aura-gateway `/inject` for those specific batch kinds. Add per-batch-kind dispatch override flag.

Risk 2: Single JSON scoreboard becomes a write-contention point.
- Probability: Low (only the supervisor writes; verifier reads).
- Impact: Medium — corrupted scoreboard would lose criteria-passed tracking.
- Mitigation: Atomic writes via `tmp + os.replace`. Verifier never writes the scoreboard directly — it writes to a separate `criteria_passed.json` file that the supervisor merges. Two writers never touch the same file.

Risk 3: Codex hints actively misguide the supervisor.
- Probability: Medium.
- Impact: Low — hints are optional; supervisor has defaults.
- Mitigation: Hint format includes confidence + rationale. Supervisor ignores hints with confidence < 0.7. Dashboard shows latest hint with provenance so Mohamed can spot bad advice.

Build plan (ordered)

#	Component	Effort	Blocks
1	`[home-path]` — read/write JSON with atomic + lockfile	30 min	2-5
2	`[home-path]` — 10 per-criterion checks	90 min	3 (verifier is independent of supervisor)
3	`[home-path]` — main loop + Pulse spawn + hint reader	120 min	4, 5
4	`com.diomande.soop2-supervisor.plist` + `com.diomande.soop2-verifier.plist`	20 min	5
5	`[home-path]` — HTML regen + handler endpoint	90 min	6
6	Codex advisor prompt + manual session bootstrap on mac4	30 min	7
7	`[home-path]` — hourly scoreboard snapshot	15 min	8
8	Bootstrap + smoke test (one full cycle through Track B.1)	45 min	done

Total: ~7.5 hours of focused work. One long session or two short ones.

Compare to ELP-1's claimed 1-2 sessions. ELP-1 realistically needs:
- Supabase schema + migrations (~60 min)
- supervisor.py + worker.py + verifier.py + reconciler.py (~4 hours)
- 2 launchd plists + observer mode logic (~45 min)
- Worker registration + heartbeat loops (~90 min)
- Grafana dashboard setup (~60 min if cloud-vm is ready)
- Bootstrap + smoke test across 2+ machines (~90 min)

ELP-1 realistic total: ~9.5 hours. ELP-Lite: ~7.5 hours. **ELP-Lite is ~20

Both are well within "ship in one focused day or two short sessions." The build-cost difference is real but not the dominant factor.

---

Stage 4 — VERDICT

Hybrid wins.

ELP-1 is genuinely good. It is over-built for the current shape of SOOP-2 but its over-building is in the right direction for any future SOOP-3, SOOP-4, or cross-project everlasting loop. The mistake would be to ship ELP-1 as-drafted now and pay the Supabase + worker-registration + observer-mode complexity tax when the work is 13 days of mostly-mechanical typing.

The mistake on the other side would be to ship pure ELP-Lite and lose ELP-1's hot-standby + Supabase-durability when SOOP-3 inevitably arrives with a longer horizon.

The amendment from Evo3 to ELP-1 (call it ELP-1.1):

1. Replace the bespoke worker model with Pulse (Path C insight). Delete `soop2_workers` table, delete worker-registration logic, delete claim semantics in the queue. Workers are Pulse spawns. The supervisor calls `pulse spawn` instead of `aura-gateway /inject`. This deletes ~30

2. Keep Supabase as the canonical state plane, but ship filesystem-only in v1.0 (Path D insight, Path E rejection). The filesystem mirror works for v1.0. Supabase becomes a v1.1 upgrade for cross-mesh durability. This defers a migration that may not be necessary if SOOP-2 closes in 13 days.

3. Add the Codex advisor (Path F insight, non-load-bearing form). Mac4 Codex session writes hints to a JSON file. Supervisor reads hints with low precedence. Costs almost nothing, adds optional strategic intelligence.

4. Add the static HTML dashboard (Path B insight, complementary form). Generated by cron, served via Syncthing. Replaces the Grafana dependency for v1.0. Grafana stays as v1.1 if dashboard reach proves limited.

5. Drop the per-criterion postgres functions (Path E rejection). Keep the verifier as Python (ELP-1 already does this — confirm and don't drift).

6. Drop mac2 observer-mode for v1.0; add a one-command failover script. Observer-mode is real complexity for a low-probability scenario in a 13-day project. Replace with `[home-path]` that does the cp+ssh+launchctl dance. v1.1 can re-introduce automatic observer-mode if SOOP-3 spans longer.

7. Defer the telegram/SMS escalation to v1.1. Dashboard banner is sufficient for v1.0 — Mohamed is going to look at the dashboard. v1.1 adds telegram if the dashboard miss-rate proves problematic.

Resulting build cost: ~6 hours instead of ELP-1's ~9.5 hours. Same failure-mode coverage for SOOP-2's actual 13-day horizon. Clean upgrade path to full ELP-1 if SOOP-3 demands it.

Delta from ELP-1 → ELP-1.1:
- Remove: `soop2_workers` table, claim TTL logic, mac2 observer-mode, postgres functions, Grafana dependency, telegram dependency (defer all).
- Replace: worker dispatch model with Pulse spawn calls.
- Add: Codex advisor (mac4), static HTML dashboard, hourly scoreboard snapshots.
- Keep: launchd supervisor, verifier daemon, filesystem state plane with Syncthing replication, escalation thresholds (just route to dashboard banner instead of telegram for now).

ELP-1's tiebreaker that survives the hybrid: the architecture document itself. ELP-1 §1 (failure model table), §3 (state plane spec), §6 (batch kinds), and §14 (acceptance criteria for ELP itself) are durable docs that ELP-Lite would have re-derived. Adopt ELP-1's documentation structure and just amend the components.

---

What direct scrutiny missed

The three layers attacking ELP-1 directly (meta-review, AMR, Codex adversarial) will probably catch the same things any review catches: claim-TTL races, schema versioning bugs, mac2 observer-mode quorum subtleties, Supabase reconciliation edge cases, telegram/SMS rate-limit-on-rate-limit-alerts loops.

What Evo3 saw by exploring sideways: the work itself (13 days of mostly-mechanical SKILL.md edits) is plausibly smaller than the protocol designed to manage it. ELP-1 is ~500 lines of spec; the SOOP-2 actual deliverable is ~282 SKILL.md frontmatter blocks (a few KB each) plus ~5 small Python files for Track E. Building a multi-machine fault-tolerant orchestration system to ship 282 frontmatter blocks is a Conway's Law slip — Mohamed is reaching for a platform shape because that's what feels Right at the seniority level, but the actual work shape is batch editing. The right response isn't "don't build the platform" (because SOOP-3 etc. will come), it's "build the smallest platform that does this job AND has a clean upgrade path." That's ELP-1.1.

The single most surprising finding: Pulse already exists and ELP-1 doesn't use it. EXECUTION-PLAN.md marks 20 of 38 sub-tracks as Pulse-eligible, but ELP-1 designs its own bespoke worker dispatch with `soop2_workers` table + claim semantics + heartbeat loops + rate-limit awareness — all of which Pulse already implements. The largest single code reduction in ELP-1.1 comes from deleting work that was already done elsewhere in the home repo. ELP-1 was authored without re-reading what Pulse already provides. This is the failure mode direct adversarial scrutiny rarely catches — it audits what's there, not what wasn't reused.

---

End of Layer 4 — Evo3 architectural exploration.
Recommendation: adopt ELP-1.1 (the hybrid amendment) as the SOOP-2 convergence engine. Defer full ELP-1 to SOOP-3 if and when a multi-week, multi-track horizon demands it.

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

crucible-output/soop-2/06d-layer4-evo3-architectures.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture · is Stage Research