Grand Diomande Research · Full HTML Reader

Handoff: Mesh / V1.1 Event-Sourced Rail / KARL Reward Engine

**From:** the agent running the "Life of Leisure, mesh from phone" goal (mac1, main session) **To:** the SEA (skill-entity-architecture) agent **Date:** 2026-05-14 **Why you're getting this:** you've been working on SEA / the mac3-worker-config track. This session drifted from a phone-app build into a corpus-wide KARL reward-engine repair. The work is real and committed, but one piece (`karl train`) is blocked on a rate-limited machine, and you may be able to unblock it. Full context below so you can pick up cleanl

Agents That Account for Themselves technical note backlog reference score 22 .md

Full Public Reader

Handoff: Mesh / V1.1 Event-Sourced Rail / KARL Reward Engine

From: the agent running the "Life of Leisure, mesh from phone" goal (mac1, main session)
To: the SEA (skill-entity-architecture) agent
Date: 2026-05-14
Why you're getting this: you've been working on SEA / the mac3-worker-config track. This session drifted from a phone-app build into a corpus-wide KARL reward-engine repair. The work is real and committed, but one piece (`karl train`) is blocked on a rate-limited machine, and you may be able to unblock it. Full context below so you can pick up cleanly.

---

1. What this session was supposed to be

The active `/goal` is the "Life of Leisure, mesh from phone" build: run all 5 Macs (mac1-5) + K11 from an iPhone, hands-off, via two pieces:

  • aura-gateway — FastAPI service on `mac1:8095`, launchd label `com.openclaw.aura-gateway`, file at `[home-path]`. The control plane: inject prompts cross-machine over SSH+tmux, spawn sessions, autopilot loops, wake-on-LAN, and a flow runtime.
  • Pebble — iOS app at `Desktop/Pebble/`, bundle `com.diomande.pebble`. The phone surface. Conversations list + chat + a FlowsSection driving the gateway's flow runtime.

The leisure metric: "Tier-C taps per active hour, under 5 = leisure." The point is to walk away from the laptop.

2. The full work chain (how we got from there to KARL)

V1.0 W1   Mesh-RPC Flow Runtime
            gateway-hosted flows over a /capabilities matrix.
            Flows = ordered steps (captain_ask, inject, autopilot_*,
            mesh_wake, ntfy_push, delay, spawn_session). Manual + cron triggers.
   │
   ▼  user asked: "what if flows were event-sourced like Kafka?"
   │
V1.1      Event-sourced flow runtime (4 phases, all shipped)
   ├─ Phase 1  _emit_event ledger writer in gateway.py
   │             writes [home-path]
   │             envelope {seq, ts, stream, type, payload, flow_id?, step_idx?, machine, cascade_depth?}
   ├─ Phase 2  flows-karl-writer daemon  ◀── KARL FIRST ENTERS HERE
   │             [home-path] — tails the ledger,
   │             folds each completed flow into ONE KARL trajectory card,
   │             appends to Desktop/karl/karl/trajectories.jsonl
   ├─ Phase 3  declarative cross-flow event triggers
   │             FlowTrigger.type="event", _flow_event_listener, cascade-depth guard ≤3,
   │             self-loop guard. Verified: morning_brief flow:complete cascades evening_close.
   └─ Phase 4  Pebble FlowSheet "Recent events" footer + GET /flows/events endpoint
   │
   ▼  Phase 2 left reward_score:None on the cards. I proposed "backfill them." User said yes.
   │
V1.2      KARL reward engine repair  ◀── the rabbit hole
            backfill surfaced THREE latent bugs that affected ALL 3203 records,
            not just the 10 flow cards. Then an audit found a structural bias.
            Then karl export found two more of the same bug.

The honest summary: KARL was never a deliberate destination. The V1.1 architecture verdict (Path C + Path F from the divergent doc) said completed flows become KARL trajectory cards. Phase 2 wired that. "Backfill the scores" was a natural next-step — and once inside KARL, the backfill kept hitting pre-existing bugs that had nothing to do with flows. Every record in the corpus was scoring a flat 0.5. Fixing that was the right call (root-cause over symptom) but it drifted us well off the leisure/mesh goal.

3. Current state — what's shipped and where it lives

Mesh / V1.1 (home repo `~/`, branch `main`)

CommitWhat
`b07b1758`V1.1 Phase 1 — `_emit_event` ledger, 23 call sites in gateway.py
`142c65ad`V1.1 Phase 2 — flows-karl-writer daemon + launchd plist
`89fecf59`V1.1 Phase 3 — cross-flow event triggers, cascade guard, self-loop guard
`984c1a0d`V1.1 Phase 4 — `/flows/events` endpoint + in-memory ring
`f26b5ae2`V1.2 — flows-karl-writer score-at-emit-time integration

⚠ All home-repo commits are LOCAL-ONLY. `git push` is blocked by a 171MB gcode blob in pre-existing history (`Desktop/lume-commerce/hardware/cad/print/gcode-review/.../plate_1.gcode`, plus two 50MB+ siblings). GitHub's pre-receive hook declines any ref. The gateway runs off the local file so there's no operational impact, but `origin/main` is ~6 commits behind. This is task #23 — needs BFG-rewrite or relocating `gcode-review/` out of the repo. If you have spare cycles, this is a clean self-contained job.

Pebble (repo `Desktop/Pebble/`, branch `main`)

CommitWhatPushed?
`4088a44`V1.1 Phase 4 — FlowSheet "Recent events" footer, FlowEventEnvelope, /flows/events client✅ pushed to origin/main

Pebble repo has no LFS problem — it pushes fine.

KARL (repo `Desktop/karl/`, branch `main`) — all 3 commits PUSHED to origin

CommitWhat
`57ce8da``fix(reward)`: align backfill with on-disk schema + persist motion_score
`5a90705``fix(reward)`: session-length normalization + wasted-motion root-cause fix
`299f978``fix(sft)`: align exporter with on-disk schema (skill string, tool_calls)

4. The KARL bugs we fixed (so you understand the corpus state)

KARL = Knowledge Agents via RL. It records agent tool-use trajectories to `trajectories.jsonl`, scores them with a 6-signal reward engine, exports advantage-weighted SFT data, trains a LoRA via MLX on mac5.

Bug 1 — `skill` dict assumption. `record.get("skill", {}).get("domain")` in 3+ sites. On disk `skill` is a routing-label string and `domain` is a sibling top-level string. Backfill treated every record as `_global` domain. Fixed → `record.get("domain")`.

Bug 2 — schema mismatch. The scoring sub-functions read `trajectory.events / total_tools / tool_counts / successes / failures / bash_errors`. On-disk records only have `trajectory.tool_calls / tool_count / tool_types`. Result: every signal fell back to its 0.5 no-data branch. Fixed with a new `_derive_scoring_fields(trajectory)` helper in `reward_engine.py` that materializes the expected fields from the real schema (present keys win, so it's forward-compatible).

Bug 3 — `motion_score` not persisted. The composite reward weights `W_MOTION=0.14` but `backfill_rewards` never wrote `motion_score` to disk. Added it.

Bug 4 — structural bias (the audit finding). After Bugs 1-3 were fixed, long Bash-heavy investigation/deploy sessions scored ~0.54 while 3-step flow cards scored ~0.70. Signal decomposition pinned 87

Bug 5 — same `skill` dict bug in `sft_exporter.py` (3 sites + an `events`-vs-`tool_calls` mismatch). Fixed; `karl export` now runs.

Corpus state after all fixes

  • 3203 records re-backfilled (force=True, run 3 times across the fixes).
  • Score distribution: `[0.555, 0.7395]`, mean 0.654, stdev 0.055.
  • The bias is gone: flow-vs-long-mixed bucket spread collapsed `0.159 → 0.025`; all 5 trajectory-shape buckets within 0.04 of each other.
  • `karl export` produced 1049 SFT examples → `Desktop/karl/karl/train.jsonl` (944) + `valid.jsonl` (105) + `karl-sft.jsonl`. 493 filtered low-advantage, 1076 filtered too-short (empties + single-tool). Examples are well-formed `{messages:[system,user,assistant]}`.

Two known gaps we deliberately did NOT fix (flagged, not silently dropped)

1. Bottom-end compression — genuinely-bad trajectories (failures, undo loops) compress toward the empty-trajectory floor ~0.55. Bad work and no work score similarly. Separate reward-shaping decision.
2. Sparse prompts — many SFT `user` messages are `[Task in unknown project]`. The trajectory store captured tool sequences but not rich task descriptions. Training on this teaches tool-planning shape without task grounding.

5. What's BLOCKED and how you (SEA agent) can help

The payoff step is `karl train` — runs a LoRA fine-tune via MLX. It dispatches to a remote machine: `KARL_TRAIN_SSH_ALIAS=mac5` by default (see `Desktop/karl/karl/config.py`). mac5 is rate-limited per this session's start-hook mesh report (`cloud-vm, mac1, mac2, mac4, mac5` all rate-limited; mac3 was the only one NOT rate-limited — and mac3 is your machine, the SEA worker config).

Ways you can help, ranked:

1. Run `karl train` from / targeting mac3. You own `Desktop/skill-entity-architecture/mac3-worker-config/`. If mac3 has the MLX toolchain (or can get it), point `KARL_TRAIN_SSH_ALIAS=mac3` (env var or `config.py`) and kick the training run. The SFT data is already exported and sitting at `Desktop/karl/karl/train.jsonl`. This is the single highest-value unblock — without training, the reward fixes are correct-but-unvalidated. Base model default is `mlx-community/gemma-3-1b-it-4bit` (small, mac3 can handle it).
2. Home-repo LFS surgery (task #23). If mac3 has the cycles: `cd ~`, BFG-rewrite or `git filter-repo` to strip `Desktop/lume-commerce/hardware/cad/print/gcode-review/` (3 files >50MB, one 171MB) from history, or just `git rm -r --cached` that dir and gitignore it. Then 6 local-only commits can finally push. Self-contained, no coordination needed.
3. Sanity-check the reward shape. If you have an opinion on KARL's reward design (you work on skill-entity-architecture, adjacent domain), the two gaps in §4 are open questions. Don't change weights without checking with Mohamed — reward shaping is design-stakes.

If you do touch KARL: it's a clean git repo with its own remote (`Diomandeee/karl`), pushes fine, 164 tests via `pytest`. Run the suite after any change.

6. Open task ledger (TaskList on mac1 main session)

  • `#23` Home-repo LFS migration (push unblock) — you could take this
  • `#24` Populate `[home-path]` with real MAC addresses for mac3/mac4/mac5 — user-side, makes the `wake_mesh` flow actually fire WoL packets
  • `#25` V1.2 KARL reward backfill — completed

7. What I'm doing next (so we don't collide)

Waiting on Mohamed's call: keep going on KARL (needs the training unblock you might provide) vs. bank KARL and return to the leisure/mesh goal (tomorrow 06:30 UTC is the first live `morning_brief → evening_close` cron cascade — the real-world validation of V1.1). Either way I'm staying on `gateway.py` / `Pebble/` / the V1.x rail. If you take KARL training or the LFS job, ping the mac1 main session so we don't double-commit. KARL repo is safe for you to own; gateway.py and Pebble are mine.

---

Key paths cheat-sheet:
- aura-gateway: `[home-path]` (launchd `com.openclaw.aura-gateway`, mac1:8095)
- flows-karl-writer: `[home-path]` (launchd `com.diomande.flows-karl-writer`)
- event ledger: `[home-path]` + cursor `[home-path]`
- KARL: `Desktop/karl/` (repo), trajectories at `karl/trajectories.jsonl`, SFT out at `karl/{train,valid,karl-sft}.jsonl`
- Pebble: `Desktop/Pebble/`
- gateway [sensitive field redacted]`plutil -extract EnvironmentVariables.GLASSES_GATEWAY_TOKEN raw [home-path]`
- memory: `[home-path]` — see `karl-reward-backfill-2026-05-13.md` and `v11-event-sourced-rail-2026-05-13.md`

Promotion Decision

Keep in the searchable backlog until it intersects a live paper or system.

Source Anchor

Leisure/HANDOFF-mesh-karl-to-sea-2026-05-14.md

Detected Structure

Method · Code Anchors · Architecture