Grand Diomande Research · Full HTML Reader

meshd Wire Protocol — Reference Specification

1. [Overview](#1-overview) 2. [Authentication](#2-authentication) 3. [Slot API](#3-slot-api) 4. [Mesh API](#4-mesh-api) 5. [Slot Lifecycle](#5-slot-lifecycle) 6. [Configuration](#6-configuration) 7. [Topology](#7-topology) 8. [Building a Client](#8-building-a-client) 9. [Error Reference](#9-error-reference) 10. [Future Work](#10-future-work)

Agents That Account for Themselves research note experiment writeup candidate score 36 .md

Full Public Reader

meshd Wire Protocol — Reference Specification

Status: Informational
Version: 1.0
Source: `https://github.com/mohameddiomande/meshd`

---

Table of Contents

1. [Overview](#1-overview)
2. [Authentication](#2-authentication)
3. [Slot API](#3-slot-api)
4. [Mesh API](#4-mesh-api)
5. [Slot Lifecycle](#5-slot-lifecycle)
6. [Configuration](#6-configuration)
7. [Topology](#7-topology)
8. [Building a Client](#8-building-a-client)
9. [Error Reference](#9-error-reference)
10. [Future Work](#10-future-work)

---

1. Overview

meshd is a Rust daemon that manages a static registry of agent slots — named PTY processes (Claude, Codex, OpenCode, Gemini, or any CLI agent) — and exposes them over an HTTP/1.1 REST API. Each machine in a Tailscale mesh runs its own independent meshd instance. Machines discover each other via a hard-coded static node table and coordinate through peer-to-peer HTTP calls.

Design goals

  • Direct PTY ownership. Each slot is a real forked process with a master PTY. Injection writes bytes directly to the PTY master fd. No shell wrapper, no exec layer.
  • Zero inter-node dependencies at boot. Every node runs standalone. Mesh coordination is best-effort; a node failure does not affect the others.
  • Stateless HTTP. All slot state is in-process. No shared database is required for operation (Supabase is used for fleet visibility only, not for control-plane decisions).
  • ANSI-clean output. The PTY reader strips ANSI escape sequences before writing to the ring buffer, so consumers get plain text lines.

Transport

PropertyValue
ProtocolHTTP/1.1
Default port9451
Alternate mesh-api compat portConfigurable via `mesh_port`
Bind address`[ip]` (all interfaces)
Content type`application/json` (all endpoints except `/stream`)
Streaming`text/event-stream` (SSE, `/stream/:slot` only)

All request bodies must be valid JSON. All responses are JSON except the SSE stream.

---

2. Authentication

2.1 Bearer token

All endpoints except `/health` require a Bearer token in the `Authorization` header.

Authorization: Bearer <token>

The token is resolved in this order:

1. `mesh_auth_token` field in `meshd.toml` `[machine]` section
2. `MESHD_TOKEN` environment variable
3. Hardcoded fallback: `claw-glasses-2026`

A missing or incorrect token returns `401 Unauthorized`:

json
{"error": "unauthorized"}

2.2 IP allowlist

Mesh coordination endpoints (`/inject-tmux`, `/relay`, `/broadcast`, `/message`, `/task`) enforce an additional IP allowlist check. The allowed ranges are:

  • `[ip]` (IPv4 loopback)
  • `::1` (IPv6 loopback)
  • `[ip]/10` — the Tailscale CGNAT range (`100.64.x.x` through `100.127.x.x`)

The IP is extracted from `X-Forwarded-For` if present (first entry only), otherwise from the TCP connection's remote address.

A request from a disallowed IP returns `403 Forbidden`:

json
{"error": "IP [ip] not allowed"}

Slot management endpoints (`/inject`, `/stream`, `/read`, `/status`) do not apply the IP allowlist — only the bearer token check.

2.3 Rate limiting

Mesh coordination endpoints share a per-endpoint-key rate limiter with a 10-second cooldown window. The key is the endpoint name (e.g., `inject-tmux`, `broadcast`) or `relay-<machine>` for relay calls.

A request that arrives within the cooldown returns `429 Too Many Requests`:

json
{"error": "rate limited", "retry_after_secs": 7}

The rate limiter is in-process and resets on daemon restart. It is not shared across nodes.

---

3. Slot API

Slots are named PTY processes defined in `meshd.toml`. Each slot has a name, an agent binary, optional arguments, optional environment overrides, and a laziness flag.

3.1 POST /inject/:slot

Write a prompt to a slot's PTY. If the slot is `lazy` and has not yet been started, it boots automatically before injection.

Auth: Bearer token required. No IP allowlist.

Path parameter: `slot` — the slot name as defined in config.

Request body:

json
{"prompt": "string (required, max 10240 bytes)"}

Response — 200 OK:

json
{
  "ok": true,
  "gen": 3,
  "slot": "claude-max"
}

`gen` is the slot's generation counter, incremented each time the process is (re)spawned. Use it to detect unintended restarts between calls.

Response — 404 Not Found:

json
{"error": "slot 'foo' not found"}

Response — 500 Internal Server Error:

json
{"error": "write_to_pty failed: errno ..."}

Implementation note: The prompt is written verbatim to the PTY master fd via `libc::write`. A trailing newline is appended if the prompt does not already end with one. Slot state transitions to `Busy` immediately after write.

Example:

sh
curl -X POST http://[ip]:9451/inject/claude-max \
  -H "Authorization: Bearer claw-glasses-2026" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "summarize the last 10 commits in this repo"}'

---

3.2 GET /stream/:slot

Open a Server-Sent Events stream for a slot. On connect, the last 50 lines from the ring buffer are emitted as backlog, followed by live output as it arrives.

Auth: Bearer token required.

Path parameter: `slot` — the slot name.

Response headers:

Content-Type: text/event-stream
Cache-Control: no-cache
X-Accel-Buffering: no

Event format:

event: line
data: <plain text line>

Each event is separated by a blank line. There is no heartbeat/ping event. Each `data` value is a single line of ANSI-stripped text from the slot's stdout/stderr.

Reconnect behavior: The broadcast channel has a capacity of 256 events. If a consumer falls behind, the channel returns a lag error and the SSE stream closes. The client should detect the connection drop and reconnect. On reconnect, backlog is re-emitted from the current ring buffer state.

Lazy boot: If the slot is `Unborn` when a stream connection arrives, the slot is booted before the subscription is established.

503 Service Unavailable is returned if the slot exists but its PTY handle is not yet available (e.g., it is mid-spawn).

Example — curl:

sh
curl -N http://[ip]:9451/stream/claude-max \
  -H "Authorization: Bearer claw-glasses-2026"

Example — EventSource (browser/Node):

js
const es = new EventSource("http://[ip]:9451/stream/claude-max", {
  headers: { "Authorization": "Bearer claw-glasses-2026" }
});
es.addEventListener("line", (e) => console.log(e.data));

---

3.3 GET /read/:slot

Snapshot read of the last N lines from the slot's ring buffer. Does not open a persistent connection. The ring buffer holds up to 500 lines.

Auth: Bearer token required.

Path parameter: `slot` — the slot name.

Query parameter: `n` (optional, default `50`) — number of lines to return.

Response — 200 OK:

json
{
  "slot": "claude-max",
  "lines": [
    "Analyzing repository...",
    "Found 3 changed files."
  ]
}

If the slot has never been booted, `lines` is an empty array.

Example:

sh
curl "http://[ip]:9451/read/claude-max?n=100" \
  -H "Authorization: Bearer claw-glasses-2026"

---

3.4 GET /status

Returns the state of all configured slots on this node.

Auth: Bearer token required.

Response — 200 OK:

json
{
  "machine": "mac1",
  "slots": [
    {
      "name": "claude-max",
      "state": "ready",
      "gen": 3,
      "pid": 84201
    },
    {
      "name": "opencode",
      "state": "unborn",
      "gen": 0,
      "pid": null
    }
  ]
}

Slot states:

StateMeaning
`unborn`Never started (or was killed and reset)
`spawning``fork()` called, waiting for PTY setup
`ready`Process running, accepting input
`busy`Prompt written, output in progress

`pid` is the child process PID. It is `null` when the slot is `unborn`.

Slots are returned sorted by name.

Example:

sh
curl http://[ip]:9451/status \
  -H "Authorization: Bearer claw-glasses-2026"

---

3.5 GET /health

Liveness check. No authentication required.

Response — 200 OK:

json
{
  "machine": "mac1",
  "uptime_secs": 3600,
  "slots": 4
}

`uptime_secs` is seconds since the daemon started. `slots` is the total number of configured slots (regardless of state).

Example:

sh
curl http://[ip]:9451/health

---

4. Mesh API

Mesh endpoints operate across nodes. All require Bearer token and IP allowlist. All are subject to rate limiting (10-second cooldown per key).

4.1 POST /inject-tmux

Inject a prompt into a tmux session. If `machine` is the local node or `"local"`, injection is done directly via `tmux load-buffer` + `paste-buffer` + `send-keys Enter`. If `machine` is a remote node name, the prompt is written to a temp file, SCP'd to the target, and injected via SSH.

Auth: Bearer token + IP allowlist. Rate limit key: `inject-tmux`.

Request body:

json
{
  "machine": "mac2",
  "prompt": "string (required, max 10240 bytes)",
  "session": "main"
}

`machine` defaults to `"local"` if omitted. `session` defaults to `"main"` if omitted.

Response — 200 OK:

json
{"ok": true, "message": "injected remotely into mac2:[ip] session 'main'"}

Implementation note: Remote injection uses `scp -o StrictHostKeyChecking=no` followed by `ssh -o StrictHostKeyChecking=no`. The SSH command chains `load-buffer`, `paste-buffer`, `send-keys Enter`, and `rm` in a single call. This avoids shell-escaping the prompt; it is written to a temp file and fed through tmux's paste mechanism, which handles arbitrary content including special characters.

Example:

sh
curl -X POST http://[ip]:9451/inject-tmux \
  -H "Authorization: Bearer claw-glasses-2026" \
  -H "Content-Type: application/json" \
  -d '{"machine": "mac2", "prompt": "git status", "session": "main"}'

---

4.2 POST /relay

Forward an HTTP request to another mesh node's meshd. The forwarding node re-signs the request with its own auth token before sending.

Auth: Bearer token + IP allowlist. Rate limit key: `relay-<machine>`.

Request body:

json
{
  "machine": "mac2",
  "path": "/status",
  "body": {}
}

`body` is optional (defaults to `{}`). The relay always issues a POST to the target path. HTTP timeout is 5 seconds.

Response — 200 OK:

json
{"ok": true, "response": { ...target response... }}

Response — 502 Bad Gateway:

json
{"error": "relay to mac2 failed: connection refused"}

Example:

sh
curl -X POST http://[ip]:9451/relay \
  -H "Authorization: Bearer claw-glasses-2026" \
  -H "Content-Type: application/json" \
  -d '{"machine": "mac4", "path": "/status", "body": {}}'

---

4.3 POST /broadcast

Fan-out a message to all known mesh nodes except the excluded one. Calls `/message` on each peer concurrently. Results are collected and returned as a map keyed by machine name.

Auth: Bearer token + IP allowlist. Rate limit key: `broadcast`.

Request body:

json
{
  "prompt": "string (required)",
  "exclude": "mac1"
}

`exclude` defaults to the local machine ID if omitted.

Response — 200 OK:

json
{
  "ok": true,
  "results": {
    "mac2": {"ok": true, "message": "injected locally into session 'main'"},
    "mac3": {"error": "relay to mac3 failed: connection refused"},
    "mac4": {"ok": true, "message": "injected locally into session 'main'"},
    "mac5": {"ok": true, "message": "injected locally into session 'main'"}
  }
}

Individual node failures are reported per-entry in `results` and do not cause the overall request to fail.

Example:

sh
curl -X POST http://[ip]:9451/broadcast \
  -H "Authorization: Bearer claw-glasses-2026" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "all agents: check your inboxes", "exclude": "mac1"}'

---

4.4 POST /message

Shorthand for local tmux injection. Equivalent to `/inject-tmux` with `machine: "local"`. Useful when calling from a remote node via `/relay`.

Auth: Bearer token + IP allowlist. Rate limit key: `message`.

Request body:

json
{
  "prompt": "string (required)",
  "session": "main"
}

`session` defaults to `"main"`.

Response — 200 OK:

json
{"ok": true, "message": "injected locally into session 'main'"}

Example:

sh
curl -X POST http://[ip]:9451/message \
  -H "Authorization: Bearer claw-glasses-2026" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "what files changed in the last hour?", "session": "main"}'

---

4.5 POST /task

Queue a work item in the local task inbox (`[home-path]`). Tasks are persisted as JSON files on disk. Useful for async work distribution across nodes.

Auth: Bearer token + IP allowlist. Rate limit key: `task`.

Request body:

json
{
  "prompt": "string (required)",
  "from": "mac1"
}

`from` defaults to `"unknown"` if omitted.

Response — 200 OK:

json
{
  "ok": true,
  "task": {
    "id": "20260330142301-1a2b3c4d",
    "prompt": "run test suite and report failures",
    "from": "mac1",
    "created": "2026-03-30T14:23:01.000Z",
    "claimed_by": null
  }
}

Task IDs are `<timestamp>-<8-hex-char nonce>`. They sort lexicographically by creation time.

Example:

sh
curl -X POST http://[ip]:9451/task \
  -H "Authorization: Bearer claw-glasses-2026" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "run test suite and report failures", "from": "mac1"}'

---

4.6 GET /tasks

List all tasks in the local inbox, sorted by creation time ascending.

Auth: Bearer token required.

Response — 200 OK:

json
{
  "tasks": [
    {
      "id": "20260330142301-1a2b3c4d",
      "prompt": "run test suite",
      "from": "mac1",
      "created": "2026-03-30T14:23:01.000Z",
      "claimed_by": null
    },
    {
      "id": "20260330142410-5e6f7a8b",
      "prompt": "check deploy logs",
      "from": "mac3",
      "created": "2026-03-30T14:24:10.000Z",
      "claimed_by": "mac2"
    }
  ]
}

Example:

sh
curl http://[ip]:9451/tasks \
  -H "Authorization: Bearer claw-glasses-2026"

---

4.7 GET /tasks/claim

Claim the oldest unclaimed task. Uses a per-task lock file (`<id>.lock`) to prevent concurrent double-claims across multiple callers. If no unclaimed task exists, `task` is `null`.

Auth: Bearer token required.

Query parameter: `claimer` (optional) — identifier for the claiming node. Defaults to the local machine ID.

Response — 200 OK (task available):

json
{
  "ok": true,
  "task": {
    "id": "20260330142301-1a2b3c4d",
    "prompt": "run test suite",
    "from": "mac1",
    "created": "2026-03-30T14:23:01.000Z",
    "claimed_by": "mac2"
  }
}

Response — 200 OK (no tasks):

json
{"ok": true, "task": null}

Implementation note: Claiming is atomic at the filesystem level. `OpenOptions::create_new` on the lock file succeeds for exactly one concurrent caller. The task JSON is updated in-place with `claimed_by` set before the lock is released. A task with `claimed_by` already set is skipped.

Example:

sh
curl "http://[ip]:9451/tasks/claim?claimer=mac2" \
  -H "Authorization: Bearer claw-glasses-2026"

---

4.8 GET /panes

List local terminal panes — specifically, Claude processes visible to `ps aux`. Returns PID, TTY, and command string for each matching process.

Auth: Bearer token required.

Response — 200 OK:

json
{
  "panes": [
    {
      "pid": "84201",
      "tty": "ttys003",
      "command": "claude --model opus"
    }
  ]
}

Processes with no controlling TTY (`?` in ps output) are excluded.

Example:

sh
curl http://[ip]:9451/panes \
  -H "Authorization: Bearer claw-glasses-2026"

---

4.9 GET /read-pane/:tty

Read the last N lines from a local TTY via `tmux capture-pane`. The `tty` path parameter is the pane target name (e.g., `ttys003`).

Auth: Bearer token required.

Path parameter: `tty` — tmux pane target.

Query parameter: `lines` (optional, default `50`).

Response — 200 OK:

json
{
  "tty": "ttys003",
  "output": "last N lines of pane output as a single string"
}

Response — 500 Internal Server Error if `tmux capture-pane` fails (e.g., the pane target does not exist).

Example:

sh
curl "http://[ip]:9451/read-pane/ttys003?lines=100" \
  -H "Authorization: Bearer claw-glasses-2026"

---

4.10 GET /mesh

Discover all known mesh nodes by probing `/health` on each entry in the static node table concurrently. Timeout per node is 3 seconds.

Auth: Bearer token required.

Response — 200 OK:

json
{
  "machine": "mac1",
  "nodes": [
    {
      "machine": "mac1",
      "ip": "[ip]",
      "healthy": true,
      "detail": {"machine": "mac1", "uptime_secs": 7200, "slots": 4}
    },
    {
      "machine": "mac3",
      "ip": "[ip]",
      "healthy": false,
      "detail": null
    }
  ]
}

`detail` contains the raw `/health` response body if the node responded, otherwise `null`.

Example:

sh
curl http://[ip]:9451/mesh \
  -H "Authorization: Bearer claw-glasses-2026"

---

5. Slot Lifecycle

5.1 State machine

  Unborn
    |
    | boot() called (eager at start, or lazy on first inject/stream)
    v
  Spawning   (fork + openpty + execve in progress)
    |
    | PTY handle ready
    v
  Ready      <---- watchdog resets here after restart
    |
    | inject() called
    v
  Busy
    |
    | (state is not automatically reset — watchdog or next inject does it)
    v
  Ready      (after watchdog cycle or next successful inject)

`Busy` is a soft state set after write. The daemon does not track when the agent finishes responding; it does not attempt to parse output for completion signals. If you need to know when a response is complete, consume the SSE stream and apply your own heuristic (e.g., prompt-ending string detection).

5.2 Eager vs. lazy slots

  • Eager (`lazy = false`): Booted at daemon start. Process is always running.
  • Lazy (`lazy = true`): Booted on the first call to `/inject` or `/stream`. After `idle_timeout_secs` (default 1800) of no activity, the watchdog kills the process and resets the slot to `Unborn`. The next inject re-boots it.

5.3 Watchdog

The watchdog runs every 10 seconds and performs three checks:

1. Dead process detection. If a slot is not `Unborn` but `waitpid(WNOHANG)` returns `Exited` or `Signaled`, the slot is restarted. For eager slots, restart means re-boot. For lazy slots, restart means reset to `Unborn` (the slot will re-boot on the next inject).

2. Idle timeout. If a lazy slot is `Ready` and has been idle for longer than `idle_timeout_secs`, it is killed and reset to `Unborn`. The SIGTERM signal is sent to the child PID.

3. Stuck detection. If a slot has been `Busy` for more than 600 seconds with no output activity, a warning is logged. No automatic action is taken.

5.4 Generation counter

Each slot maintains an atomic `gen` counter. It starts at 0 and increments by 1 each time the slot successfully boots. The `/inject` response and `/status` response both include the current `gen` value. A client that injected at `gen=3` and later reads `gen=5` knows the process restarted twice in the interim and the earlier output is gone.

5.5 PTY configuration

Default PTY dimensions: 50 rows x 220 columns. Configurable per-slot via `rows` and `cols` in config. The child process inherits the parent environment with slot-specific overrides applied. `CLAUDECODE` is unconditionally removed from the child's environment to prevent the agent from detecting it is running inside another Claude session.

---

6. Configuration

Config file location: `[home-path]` (default). Can be overridden by passing a path as the first CLI argument.

6.1 Full example

toml
[machine]
id = "mac1"
tailscale_ip = "[ip]"
api_port = 9451
supabase_url = "https://yourproject.supabase.co"
mesh_port = 9450              # optional: bind a second port for mesh-api compat
mesh_auth_token = "secret"    # optional: overrides MESHD_TOKEN env var

[[slots]]
name = "claude-max"
agent = "claude"
args = ["--model", "opus"]
lazy = true
idle_timeout_secs = 1800
rows = 50
cols = 220
env = { CLAUDE_CONFIG_DIR = "[home-path] }

[[slots]]
name = "opencode"
agent = "opencode"
lazy = false

[[slots]]
name = "gemini"
agent = "gemini"
lazy = true
idle_timeout_secs = 3600

6.2 Field reference

[machine]

FieldTypeRequiredDescription
`id`stringyesMachine identifier. Used as the `machine` field in API responses and Supabase records.
`tailscale_ip`stringyesTailscale IP of this machine. Stored in Supabase for fleet-wide visibility.
`api_port`integeryesPort meshd listens on.
`supabase_url`stringyesSupabase project URL. Only used for fleet registry; not required for local operation.
`mesh_port`integernoIf set and different from `api_port`, meshd binds a second listener on this port. Useful for backward compat with older mesh-api clients.
`mesh_auth_token`stringnoAuth token for mesh operations. Overrides `MESHD_TOKEN` env var.

[[slots]]

FieldTypeRequiredDefaultDescription
`name`stringyesSlot identifier. Used as the `:slot` path parameter.
`agent`stringyesBinary name or path to exec.
`args`array of stringsno`[]`Arguments passed to the agent process.
`env`tableno`{}`Environment variable overrides. Merged on top of the inherited env.
`lazy`boolno`false`If true, slot boots on first use and dies after idle timeout.
`idle_timeout_secs`integerno`1800`Idle timeout in seconds for lazy slots.
`rows`integerno`50`PTY row count passed to `openpty`.
`cols`integerno`220`PTY column count passed to `openpty`.

6.3 Environment variables

VariableEffect
`MESHD_TOKEN`Auth token. Overridden by `mesh_auth_token` in config.
`SUPABASE_ANON_KEY`If set, enables Supabase registration and heartbeat. If not set, meshd runs without fleet registry.
`RUST_LOG`Log verbosity. Example: `RUST_LOG=meshd=debug`. Default: `meshd=info`.

---

7. Topology

7.1 Static node table

The static mesh node table is compiled into the binary in `mesh.rs`:

IDTailscale IP
mac1[ip]
mac2[ip]
mac3[ip]
mac4[ip]
mac5[ip]

All mesh operations that reference a `machine` field use these mappings to resolve the target IP. An unknown machine name returns an error immediately without a network call.

7.2 Communication model

Each meshd node is fully independent. There is no leader election and no consensus protocol. Coordination is done via direct HTTP calls:

  • `/relay` proxies a single request to a named node.
  • `/broadcast` fans out concurrently to all nodes except the excluded one. It uses `/message` on each target (local tmux injection at the remote node), not `/inject` against a slot.
  • `/inject-tmux` with a remote `machine` uses SCP + SSH, not HTTP. This is the mechanism for injecting into arbitrary tmux sessions on machines where the calling node has SSH key access.

7.3 Supabase fleet registry

When `SUPABASE_ANON_KEY` is set, meshd upserts a row per slot into the `agent_slots` table at boot and patches it every 60 seconds with current status and generation. This provides fleet-wide observability without being part of the control plane. A meshd instance operates correctly if Supabase is unreachable.

The `agent_slots` schema used by the registry:

ColumnSource
`id``<machine_id>:<slot_name>`
`machine_id``[machine].id`
`slot_name``[[slots]].name`
`agent``[[slots]].agent`
`account_tier`Last path component of `CLAUDE_CONFIG_DIR` env var, if set
`status`Slot state string (`offline`, `spawning`, `ready`, `busy`)
`generation`Current gen counter
`last_seen`RFC 3339 timestamp
`api_port``[machine].api_port`
`tailscale_ip``[machine].tailscale_ip`

7.4 Security boundaries

The IP allowlist enforces that only Tailscale peers and localhost can call mesh coordination endpoints. This means:

  • Anyone with a shell on a mesh node can reach all mesh endpoints on all other nodes.
  • External clients (outside Tailscale) can only reach `/health` without authentication and slot endpoints with the bearer token.
  • The bearer token is shared across the mesh. Rotating it requires a config update and restart on all nodes.

---

8. Building a Client

8.1 Inject and stream loop

The canonical pattern for sending a prompt and consuming the response:

python
import requests
import sseclient  # pip install sseclient-py

BASE = "http://[ip]:9451"
[sensitive field redacted]
HEADERS = {"Authorization": f"Bearer {TOKEN}"}

def inject(slot: str, prompt: str) -> int:
    r = requests.post(
        f"{BASE}/inject/{slot}",
        json={"prompt": prompt},
        headers=HEADERS,
        timeout=10,
    )
    r.raise_for_status()
    data = r.json()
    return data["gen"]  # remember this for restart detection

def stream(slot: str, gen_at_inject: int):
    r = requests.get(
        f"{BASE}/stream/{slot}",
        headers=HEADERS,
        stream=True,
        timeout=None,
    )
    r.raise_for_status()

    client = sseclient.SSEClient(r)
    for event in client.events():
        if event.event == "line":
            yield event.data

# Usage
gen = inject("claude-max", "list all Python files in this repo")
for line in stream("claude-max", gen):
    print(line)
    # Apply your own completion detection here:
    # e.g., if line.startswith("> ") or "Human:" in line: break

8.2 Detecting restarts mid-stream

python
def safe_inject_and_stream(slot: str, prompt: str):
    gen_before = inject(slot, prompt)

    for line in stream(slot, gen_before):
        # Check if slot restarted during streaming
        status = requests.get(f"{BASE}/status", headers=HEADERS).json()
        slot_info = next(s for s in status["slots"] if s["name"] == slot)

        if slot_info["gen"] != gen_before:
            raise RuntimeError(f"Slot restarted mid-stream (gen {gen_before} -> {slot_info['gen']})")

        yield line

In practice, check gen on reconnect rather than on every line. The gen from `/inject` and `/status` are the same counter.

8.3 Task distribution across nodes

A worker on each node polls and claims tasks from its local inbox:

python
import time

def worker_loop(node_ip: str, node_id: str, slot: str):
    base = f"http://{node_ip}:9451"
    headers = {"Authorization": f"Bearer {TOKEN}"}

    while True:
        r = requests.get(
            f"{base}/tasks/claim",
            params={"claimer": node_id},
            headers=headers,
        )
        r.raise_for_status()
        task = r.json().get("task")

        if task is None:
            time.sleep(5)
            continue

        print(f"[{node_id}] claimed task {task['id']} from {task['from']}")
        inject(slot, task["prompt"])  # uses local BASE/HEADERS
        # consume stream or just fire-and-forget

# To queue work from any node:
def dispatch(prompt: str, target_ip: str):
    r = requests.post(
        f"http://{target_ip}:9451/task",
        json={"prompt": prompt, "from": "mac1"},
        headers={"Authorization": f"Bearer {TOKEN}"},
    )
    r.raise_for_status()
    return r.json()["task"]["id"]

8.4 Checking mesh health

python
def check_fleet():
    r = requests.get(f"{BASE}/mesh", headers=HEADERS)
    nodes = r.json()["nodes"]

    for node in nodes:
        status = "UP" if node["healthy"] else "DOWN"
        uptime = node.get("detail", {}).get("uptime_secs", "?") if node["detail"] else "?"
        print(f"{node['machine']} ({node['ip']})  {status}  uptime={uptime}s")

---

9. Error Reference

HTTP StatusMeaningWhen it occurs
200OKRequest succeeded
400Bad RequestMalformed JSON body
401UnauthorizedMissing or incorrect Bearer token
403ForbiddenClient IP not in allowlist
404Not FoundSlot name does not exist in config
429Too Many RequestsRate limit cooldown active. See `retry_after_secs` in body.
500Internal Server ErrorPTY write failed, tmux command failed, process spawn error
502Bad Gateway`/relay` target returned an error or the connection failed
503Service UnavailableSlot exists but PTY handle not ready (mid-spawn)

All error responses use this envelope:

json
{"error": "human-readable description"}

Rate limit responses include an additional field:

json
{"error": "rate limited", "retry_after_secs": 7}

---

10. Future Work

NATS transport

The current inter-node communication model is point-to-point HTTP with static IP resolution. NATS JetStream is already deployed in the mesh (`:4222` on all nodes). The planned evolution is to replace `/relay`, `/broadcast`, and `/message` with a NATS subject hierarchy:

mesh.inject.<machine>   -- targeted inject
mesh.broadcast          -- fan-out, all subscribers
mesh.task.<machine>     -- task delivery with ACK

This would eliminate the 5-second relay timeout, enable durable delivery for tasks, and allow new nodes to join without a binary recompile.

Skill schema

Slots currently accept arbitrary prompt strings. The planned skill schema would allow a slot to declare what it handles:

toml
[[slots]]
name = "claude-max"
skills = ["code-review", "refactor", "explain"]

A caller could then request `POST /inject-skill` with a `skill` field instead of a raw prompt, and the dispatcher would route to a slot that declares it. This enables capability-based routing across the mesh without the caller knowing which slot or machine handles each skill type.

OPA policy distribution

The OPA instance at `:8181` currently enforces per-machine policies. The planned extension is to have meshd serve its current slot state and recent injection log as input data to OPA, and to accept policy bundles via a new endpoint:

POST /policy          -- push a new Rego bundle to this node
GET  /policy/status   -- query current policy evaluation result against slot state

This would allow the orchestrator to distribute updated policies to all nodes via `/broadcast` rather than requiring SSH-based file pushes and OPA reloads.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

meshd/marketing/protocol-spec.md

Detected Structure

Method · Evaluation · Figures · Code Anchors · Architecture