Grand Diomande Research · Full HTML Reader

Tri-Agent Coordination Protocol — AAO Wave 5

**Status:** LIVE as of 2026-02-26 **Authority:** Graph Kernel (GK) at `:8001` **Replaces:** `[home-path]` (file-based, one-sided, deprecated)

Agents That Account for Themselves technical note backlog reference score 22 .md

Full Public Reader

Tri-Agent Coordination Protocol — AAO Wave 5

Status: LIVE as of 2026-02-26
Authority: Graph Kernel (GK) at `:8001`
Replaces: `[home-path]` (file-based, one-sided, deprecated)

---

1. Why This Exists

Three AI CLIs coordinate in the OpenClaw mesh:

AgentModelStrengthsSpawner
Claude CodeOpus 4.6Agentic dev, architecture, multi-file refactors`clawdbot agent`
Gemini CLIGemini 3 Pro PreviewResearch, creative, long-context (>50K tokens)`gemini` CLI
Codex CLIGPT-5.3 CodexStructured output, data analysis, low hallucination`codex` binary

Problem: The file-based `cross_agent_handshake.json` was one-sided — Gemini wrote 3 messages, Codex acknowledged once, Claude Code never participated. The event bus (`event_bus.sock`) died after 1 test message. Two agents could grab the same task simultaneously with no conflict detection.

Solution: All coordination now flows through the Graph Kernel HTTP API. The GK is already the authority layer for task tickets, dedup, attestations, and reputation. Adding agent presence + claims makes it the single source of truth for the entire multi-agent system. No file watching, no Unix sockets, no race conditions.

---

2. GK Base URLs

ContextURLWhen to use
Local (Mac2, where GK runs)`http://localhost:8001`Agent running on Mac2
Tailscale mesh (any Mac)`http://[ip]:8001`Agent running on Mac1/Mac3/Mac4
Cloud Run (remote)`https://graph-kernel-274020562532.us-central1.run.app`Cloud VM or external

Health check: `GET /health` returns `{"status":"healthy","version":"0.1.0","backend":"postgres",...}`

---

3. The Four Endpoints

3.1 `POST /api/agent/heartbeat` — Register Presence

Send every 30 seconds while your process is running. This is how the mesh knows you exist.

Request:

json
{
  "agent_id": "gemini-cli",
  "device": "mac1",
  "capabilities": ["long-context", "research", "creative"],
  "model": "gemini-3-pro-preview",
  "status": "available"
}

Fields:

FieldRequiredValuesNotes
`agent_id`Yes`"claude-code"`, `"gemini-cli"`, `"codex-cli"`Exactly one of these three
`device`Yes`"mac1"`, `"mac2"`, `"mac3"`, `"mac4"`, `"cloud-vm"`Which machine you're on
`capabilities`NoArray of stringsWhat you're good at. Used by SmartRouter for task matching
`model`NoStringYour current model ID (e.g., `"gemini-3-pro-preview"`, `"gpt-5.3-codex"`, `"opus-4"`)
`status`No (default: `"available"`)`"available"`, `"busy"`, `"rate_limited"`Critical for routing

Response:

json
{
  "agent_id": "gemini-cli",
  "registered": true,
  "roster_size": 3,
  "last_seen": "2026-02-26T20:23:56.711904+00:00"
}

What happens server-side:
- Agent is stored in an in-memory roster (HashMap, 60s TTL)
- Stale entries (no heartbeat in 60s) are automatically evicted
- Presence is also persisted as knowledge triples in the graph DB:
- `(agent:gemini-cli, has_status, available)` conf=1.0
- `(agent:gemini-cli, on_device, mac1)` conf=1.0
- `(agent:gemini-cli, has_capabilities, long-context,research,creative)` conf=1.0
- `(agent:gemini-cli, has_model, gemini-3-pro-preview)` conf=1.0

Status transitions and what they mean:

available ──[receive task]──> busy ──[task done]──> available
    │                           │
    │                           └──[task failed]──> available
    │
    └──[hit API quota]──> rate_limited ──[quota resets]──> available

When you set `status: "rate_limited"`:
- The daemon's SmartRouter will stop routing tasks to you
- Other available agents pick up the slack
- This replaces the old stderr-based rate limit detection in `gemini-spawner.js`

---

3.2 `GET /api/agent/roster` — Who's Alive Right Now

Check the roster before dispatching work or before deciding whether to claim a task.

Request: No body needed. Just `GET /api/agent/roster`.

Response:

json
{
  "agents": [
    {
      "agent_id": "claude-code",
      "device": "mac2",
      "capabilities": ["code", "research", "architecture"],
      "model": "opus-4",
      "status": "available",
      "last_heartbeat": "2026-02-26T20:23:54.765084+00:00",
      "tasks_completed": 12,
      "reputation_score": 0.85
    },
    {
      "agent_id": "gemini-cli",
      "device": "mac1",
      "capabilities": ["long-context", "research", "creative"],
      "model": "gemini-3-pro-preview",
      "status": "busy",
      "last_heartbeat": "2026-02-26T20:23:56.711904+00:00",
      "tasks_completed": 7,
      "reputation_score": 0.72
    },
    {
      "agent_id": "codex-cli",
      "device": "mac3",
      "capabilities": ["structured-output", "data-analysis", "math"],
      "model": "gpt-5.3-codex",
      "status": "available",
      "last_heartbeat": "2026-02-26T20:24:10.123456+00:00",
      "tasks_completed": 4,
      "reputation_score": 0.90
    }
  ],
  "total": 3,
  "as_of": "2026-02-26T20:24:15.000000+00:00"
}

Field details:

FieldMeaning
`tasks_completed`Counter since last GK restart (in-memory). Historical count lives in attestation triples.
`reputation_score`0.0-1.0, computed from W3 quality attestations. Weighted: high=1.0, medium=0.6, low=0.3, failed=0.0. Default 0.5 if no attestations yet.
`last_heartbeat`ISO 8601. If older than 60s from `as_of`, agent won't appear (evicted).

Use cases:
- Before dispatching: check if target agent is alive and available
- Load balancing: route to agent with lowest `tasks_completed` or highest `reputation_score`
- Health dashboards: display mesh agent status

---

3.3 `POST /api/agent/claim` — Atomic Task Ownership

This is the critical endpoint. Before executing any task from the Supabase `mac_tasks` queue, you MUST claim it through the GK. This prevents two agents from working on the same task simultaneously.

Request:

json
{
  "agent_id": "codex-cli",
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "device": "mac3"
}

Response (success — you own the task):

json
{
  "claimed": true,
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "claimed_by": "codex-cli",
  "conflict": null
}

Response (conflict — another agent got there first):

json
{
  "claimed": false,
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "claimed_by": "claude-code",
  "conflict": "claude-code"
}

Behavior:
- Claims are atomic — the GK uses a write lock internally, no race conditions
- Claims are idempotent — the same agent claiming the same task twice returns success
- Claims are stored as knowledge triples for persistence:
- `(task:a1b2c3d4, claimed_by, codex-cli)` conf=1.0
- `(task:a1b2c3d4, claimed_on_device, mac3)` conf=1.0
- `(task:a1b2c3d4, claimed_at, 2026-02-26T20:24:02+00:00)` conf=1.0
- In-memory claim store holds up to 1000 recent claims (oldest evicted first)
- If `claimed: false`, do not execute the task — skip it and move to the next one in the queue

---

3.4 `GET /api/agent/handoff/:task_id` — Task Ownership History

Reconstruct the full chain of who touched a task. Useful for debugging, auditing, and context recovery when picking up someone else's work.

Request: `GET /api/agent/handoff/a1b2c3d4-e5f6-7890-abcd-ef1234567890`

Response:

json
{
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "history": [
    {
      "agent_id": "claude-code",
      "device": "mac2",
      "action": "claimed",
      "timestamp": "2026-02-26T20:24:02+00:00",
      "reason": null
    },
    {
      "agent_id": "gemini-cli",
      "device": "mac1",
      "action": "claimed",
      "timestamp": "2026-02-26T20:30:15+00:00",
      "reason": null
    }
  ],
  "current_owner": "gemini-cli",
  "context_summary": "Task has 6 triples: claimed_by, claimed_at, claimed_on_device, ..."
}

Use cases:
- Context recovery: see who worked on a task before you, what device they were on
- Audit trail: full provenance chain for task execution
- Debugging: identify if a task was claimed multiple times (possible timeout/relay)

---

4. Complete Protocol Flow

┌─────────────────────────────────────────────────────────────────┐
│                     AGENT LIFECYCLE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. STARTUP                                                     │
│     POST /api/agent/heartbeat                                   │
│     {"agent_id":"YOUR_ID", "status":"available", ...}           │
│     Start 30s heartbeat timer                                   │
│                                                                 │
│  2. TASK ACQUISITION (from Supabase mac_tasks queue)            │
│     a) Poll mac_tasks WHERE status='pending'                    │
│        ORDER BY priority ASC, created_at ASC                    │
│     b) GET /api/agent/roster                                    │
│        Check if task is better suited for another agent         │
│     c) POST /api/agent/claim                                    │
│        {"agent_id":"YOUR_ID", "task_id":"...", "device":"..."}  │
│        ├─ claimed:true  → proceed to step 3                    │
│        └─ claimed:false → skip, go back to 2a for next task    │
│                                                                 │
│  3. EXECUTION                                                   │
│     a) POST /api/agent/heartbeat (status: "busy")               │
│     b) UPDATE mac_tasks SET status='running'                    │
│        WHERE id=task_id                                         │
│     c) Execute the task                                         │
│     d) Continue heartbeating every 30s (status: "busy")         │
│                                                                 │
│  4. COMPLETION                                                  │
│     a) UPDATE mac_tasks SET status='completed'                  │
│     b) POST /api/knowledge/attestation    ← existing W3 endpoint│
│        {"device":"YOUR_DEVICE",                                 │
│         "quality":"high|medium|low|failed",                     │
│         "task_id":"...", "task_type":"code"}                    │
│     c) POST /api/agent/heartbeat (status: "available")          │
│     d) Go back to step 2                                        │
│                                                                 │
│  5. RATE LIMIT HIT                                              │
│     a) POST /api/agent/heartbeat (status: "rate_limited")       │
│     b) SmartRouter will stop routing tasks to you               │
│     c) Wait for quota reset, then:                              │
│     d) POST /api/agent/heartbeat (status: "available")          │
│                                                                 │
│  6. SHUTDOWN                                                    │
│     Stop heartbeating → GK evicts you after 60s                 │
│     No explicit "disconnect" needed                             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

---

5. Integration with Existing Infrastructure

5.1 Supabase `mac_tasks` — Task Queue (KEEP USING)

The task queue lives in Supabase. The GK does NOT replace the queue — it adds a coordination layer on top.

sql
-- Relevant columns for agent coordination:
mac_tasks.id                  -- UUID, use as task_id in claims
mac_tasks.status              -- 'pending', 'running', 'completed', 'failed'
mac_tasks.priority            -- 0=CRITICAL, 1=HIGH, 2=MEDIUM, 3=LOW
mac_tasks.description         -- Task content
mac_tasks.metadata            -- JSONB with task_type, project_path, etc.
mac_tasks.relay_from          -- Device that relayed the task
mac_tasks.relay_to            -- Target device (null = any)
mac_tasks.relay_reason        -- 'timeout', 'manual', 'capability'
mac_tasks.pickup_instructions -- JSONB context for receiving agent
mac_tasks.timeout_at          -- Auto-reclaim deadline
mac_tasks.admissibility_token -- HMAC from GK (W1)
mac_tasks.context_policy_ref  -- Policy scope: 'code', 'research', 'general'

Your task poll query:

sql
SELECT * FROM mac_tasks
WHERE status = 'pending'
  AND (relay_to IS NULL OR relay_to = 'YOUR_DEVICE')
ORDER BY priority ASC, created_at ASC
LIMIT 5;

5.2 SmartRouter — Routing Decisions (KEEP USING)

The daemon's SmartRouter (`[home-path]`) uses 4 weighted signals to route tasks:

SignalWeightWhat it measures
Capacity0.4Remaining API quota in rolling window
Type affinity0.3Historical success rate per task type per provider
Load balancing0.2Active agent count
Recent performance0.1Success rate in last 5 tasks

What changes: SmartRouter can now also read `GET /api/agent/roster` to factor in:
- Agent `status` (don't route to `rate_limited` agents)
- Agent `reputation_score` (prefer higher-reputation agents)
- Agent `capabilities` (route research tasks to agents with "research" capability)

5.3 Arena — Parallel Competition (KEEP USING)

For CRITICAL/HIGH priority tasks, the Arena races Claude + Gemini in parallel and judges the winner via Sonnet. No change needed — Arena operates above the coordination layer. The only addition: both contestants should `POST /api/agent/claim` before executing, and the loser's claim can be ignored (claim is advisory, not a hard lock).

5.4 Quality Attestations — Reputation (EXISTING W3 ENDPOINT)

After completing a task, post an attestation. This feeds into the `reputation_score` visible in the roster.

bash
POST /api/knowledge/attestation
{
  "device": "mac1",
  "quality": "high",
  "task_id": "a1b2c3d4-...",
  "task_type": "research",
  "duration_ms": 45000,
  "output_length": 3200,
  "had_errors": false
}

5.5 Task Tickets — Admissibility (EXISTING W1 ENDPOINT)

For policy-scoped tasks, get an execution ticket before running:

bash
POST /api/task_ticket
{
  "task_id": "a1b2c3d4-...",
  "device": "mac1",
  "policy_id": "research",
  "ttl_seconds": 3600
}

---

6. Implementation Guide Per Agent

6.1 Gemini CLI (`gemini-spawner.js` integration)

The heartbeat should be added to the spawn lifecycle in `[home-path]`:

javascript
// In spawn() method, after process starts:
async spawn(task, model) {
  // ... existing spawn logic ...

  // NEW: Claim task through GK before execution
  const claimResp = await fetch('http://localhost:8001/api/agent/claim', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      agent_id: 'gemini-cli',
      task_id: task.id,
      device: process.env.DEVICE_ID || 'mac1'
    })
  }).then(r => r.json());

  if (!claimResp.claimed) {
    console.log(`Task ${task.id} already claimed by ${claimResp.conflict}, skipping`);
    return null; // Don't spawn
  }

  // ... proceed with Gemini spawn ...
}

Add a heartbeat loop to the spawner constructor or daemon main loop:

javascript
// In daemon_v3.js main loop or GeminiSpawner constructor:
setInterval(async () => {
  const activeCount = geminiSpawner.activeAgents.size;
  await fetch('http://localhost:8001/api/agent/heartbeat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      agent_id: 'gemini-cli',
      device: process.env.DEVICE_ID || 'mac1',
      capabilities: ['long-context', 'research', 'creative'],
      model: 'gemini-3-pro-preview',
      status: activeCount > 0 ? 'busy' : 'available'
    })
  }).catch(() => {}); // Best-effort, don't crash on GK downtime
}, 30_000);

6.2 Codex CLI (`openclaw-agent.md` integration)

Add to `[home-path]`:

markdown
## Agent Coordination Protocol

Before executing any task from the Supabase queue:
1. Heartbeat: `curl -X POST localhost:8001/api/agent/heartbeat -H "Content-Type: application/json" -d '{"agent_id":"codex-cli","device":"mac3","capabilities":["structured-output","data-analysis","math"],"model":"gpt-5.3-codex","status":"available"}'`
2. Claim: `curl -X POST localhost:8001/api/agent/claim -H "Content-Type: application/json" -d '{"agent_id":"codex-cli","task_id":"TASK_UUID","device":"mac3"}'`
3. If `claimed:false` → skip the task, another agent has it.
4. After completion: post attestation to `/api/knowledge/attestation`
5. If rate-limited: heartbeat with `"status":"rate_limited"`

For the `codex-main` wrapper or `codex-bootstrap`, add a heartbeat on startup:

bash
# In [home-path] after validation:
curl -s -X POST localhost:8001/api/agent/heartbeat \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"codex-cli","device":"'"$(hostname -s)"'","capabilities":["structured-output","data-analysis","math"],"model":"gpt-5.3-codex","status":"available"}' \
  > /dev/null 2>&1 || true

6.3 Claude Code (already integrated)

Claude Code runs on Mac2 where the GK is local. The daemon (`daemon_v3.js`) should add claim calls before spawning Claude agents, and heartbeat in its main loop. This is handled separately since Claude Code has direct access.

---

7. Error Handling

ScenarioWhat to do
GK is down (connection refused)Continue operating without coordination. Heartbeats and claims are best-effort. Don't crash or block on GK errors.
Claim returns `claimed:false`Skip the task. Do NOT execute it. Move to the next pending task in the queue.
Heartbeat failsRetry on next 30s interval. After 60s without heartbeat, the GK evicts you from the roster — but you keep running.
GK restartsIn-memory roster and claims reset. Agents re-register on next heartbeat. Knowledge triples in PostgreSQL persist across restarts.
Two agents claim simultaneouslyImpossible — the GK uses a `parking_lot::RwLock` write lock. One will win, one will get `claimed:false`.

---

8. Querying Agent Data via Knowledge Graph

All coordination data is also stored as knowledge triples. You can query them directly:

bash
# All triples about claude-code
curl "localhost:8001/api/knowledge?subject=agent:claude-code"

# All triples about a specific task
curl "localhost:8001/api/knowledge?subject=task:a1b2c3d4-..."

# All claims (across all tasks)
curl "localhost:8001/api/knowledge?predicate=claimed_by"

# All status updates
curl "localhost:8001/api/knowledge?predicate=has_status"

---

9. Deprecation Notice

The following are deprecated and should no longer be used:

DeprecatedReplacement
`[home-path]``POST /api/agent/heartbeat` + `GET /api/agent/roster`
`[home-path]`Knowledge triples via `POST /api/knowledge`
`[home-path]`GK HTTP endpoints (no Unix sockets needed)
`[home-path]``POST /api/agent/heartbeat`
`lock_registry` in handshake JSON`POST /api/agent/claim`
Stderr-based rate limit detection`POST /api/agent/heartbeat` with `status: "rate_limited"`

The old files will not be deleted (they're historical reference), but agents should stop reading/writing them.

---

10. Verification Commands

Run these to confirm everything works:

bash
# 1. Health check
curl -s localhost:8001/health | python3 -m json.tool

# 2. Register all three agents
curl -s -X POST localhost:8001/api/agent/heartbeat \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"claude-code","device":"mac2","capabilities":["code","research","architecture"],"model":"opus-4","status":"available"}'

curl -s -X POST localhost:8001/api/agent/heartbeat \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"gemini-cli","device":"mac1","capabilities":["long-context","research","creative"],"model":"gemini-3-pro-preview","status":"available"}'

curl -s -X POST localhost:8001/api/agent/heartbeat \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"codex-cli","device":"mac3","capabilities":["structured-output","data-analysis","math"],"model":"gpt-5.3-codex","status":"available"}'

# 3. Verify roster shows all 3
curl -s localhost:8001/api/agent/roster | python3 -m json.tool

# 4. Claim a task (should succeed)
curl -s -X POST localhost:8001/api/agent/claim \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"claude-code","task_id":"test-verify","device":"mac2"}'

# 5. Conflict test (should fail with conflict)
curl -s -X POST localhost:8001/api/agent/claim \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"gemini-cli","task_id":"test-verify","device":"mac1"}'

# 6. Handoff history
curl -s localhost:8001/api/agent/handoff/test-verify | python3 -m json.tool

---

11. Summary

WhatBefore (W1-W4)After (W5)
Agent discoveryFile-based handshake (one-sided)`GET /api/agent/roster` (real-time, TTL-based)
Task ownershipNone (race conditions possible)`POST /api/agent/claim` (atomic, conflict detection)
Rate limit signalingStderr parsing in spawner`status: "rate_limited"` in heartbeat
Agent presencePID in JSON file30s heartbeat with auto-eviction
Handoff context`pickup_instructions` JSONB only`/api/agent/handoff/:task_id` + knowledge triples
AuthorityScattered (files + Supabase + spawner state)Graph Kernel is single source of truth

Promotion Decision

Keep in the searchable backlog until it intersects a live paper or system.

Source Anchor

Comp-Core/docs/AGENT_COORDINATION_PROTOCOL.md

Detected Structure

Method · Code Anchors · Architecture