Tri-Agent Coordination Protocol — AAO Wave 5
**Status:** LIVE as of 2026-02-26 **Authority:** Graph Kernel (GK) at `:8001` **Replaces:** `[home-path]` (file-based, one-sided, deprecated)
Full Public Reader
Tri-Agent Coordination Protocol — AAO Wave 5
Status: LIVE as of 2026-02-26
Authority: Graph Kernel (GK) at `:8001`
Replaces: `[home-path]` (file-based, one-sided, deprecated)
---
1. Why This Exists
Three AI CLIs coordinate in the OpenClaw mesh:
| Agent | Model | Strengths | Spawner |
|---|---|---|---|
| Claude Code | Opus 4.6 | Agentic dev, architecture, multi-file refactors | `clawdbot agent` |
| Gemini CLI | Gemini 3 Pro Preview | Research, creative, long-context (>50K tokens) | `gemini` CLI |
| Codex CLI | GPT-5.3 Codex | Structured output, data analysis, low hallucination | `codex` binary |
Problem: The file-based `cross_agent_handshake.json` was one-sided — Gemini wrote 3 messages, Codex acknowledged once, Claude Code never participated. The event bus (`event_bus.sock`) died after 1 test message. Two agents could grab the same task simultaneously with no conflict detection.
Solution: All coordination now flows through the Graph Kernel HTTP API. The GK is already the authority layer for task tickets, dedup, attestations, and reputation. Adding agent presence + claims makes it the single source of truth for the entire multi-agent system. No file watching, no Unix sockets, no race conditions.
---
2. GK Base URLs
| Context | URL | When to use |
|---|---|---|
| Local (Mac2, where GK runs) | `http://localhost:8001` | Agent running on Mac2 |
| Tailscale mesh (any Mac) | `http://[ip]:8001` | Agent running on Mac1/Mac3/Mac4 |
| Cloud Run (remote) | `https://graph-kernel-274020562532.us-central1.run.app` | Cloud VM or external |
Health check: `GET /health` returns `{"status":"healthy","version":"0.1.0","backend":"postgres",...}`
---
3. The Four Endpoints
3.1 `POST /api/agent/heartbeat` — Register Presence
Send every 30 seconds while your process is running. This is how the mesh knows you exist.
Request:
{
"agent_id": "gemini-cli",
"device": "mac1",
"capabilities": ["long-context", "research", "creative"],
"model": "gemini-3-pro-preview",
"status": "available"
}Fields:
| Field | Required | Values | Notes |
|---|---|---|---|
| `agent_id` | Yes | `"claude-code"`, `"gemini-cli"`, `"codex-cli"` | Exactly one of these three |
| `device` | Yes | `"mac1"`, `"mac2"`, `"mac3"`, `"mac4"`, `"cloud-vm"` | Which machine you're on |
| `capabilities` | No | Array of strings | What you're good at. Used by SmartRouter for task matching |
| `model` | No | String | Your current model ID (e.g., `"gemini-3-pro-preview"`, `"gpt-5.3-codex"`, `"opus-4"`) |
| `status` | No (default: `"available"`) | `"available"`, `"busy"`, `"rate_limited"` | Critical for routing |
Response:
{
"agent_id": "gemini-cli",
"registered": true,
"roster_size": 3,
"last_seen": "2026-02-26T20:23:56.711904+00:00"
}What happens server-side:
- Agent is stored in an in-memory roster (HashMap, 60s TTL)
- Stale entries (no heartbeat in 60s) are automatically evicted
- Presence is also persisted as knowledge triples in the graph DB:
- `(agent:gemini-cli, has_status, available)` conf=1.0
- `(agent:gemini-cli, on_device, mac1)` conf=1.0
- `(agent:gemini-cli, has_capabilities, long-context,research,creative)` conf=1.0
- `(agent:gemini-cli, has_model, gemini-3-pro-preview)` conf=1.0
Status transitions and what they mean:
available ──[receive task]──> busy ──[task done]──> available
│ │
│ └──[task failed]──> available
│
└──[hit API quota]──> rate_limited ──[quota resets]──> availableWhen you set `status: "rate_limited"`:
- The daemon's SmartRouter will stop routing tasks to you
- Other available agents pick up the slack
- This replaces the old stderr-based rate limit detection in `gemini-spawner.js`
---
3.2 `GET /api/agent/roster` — Who's Alive Right Now
Check the roster before dispatching work or before deciding whether to claim a task.
Request: No body needed. Just `GET /api/agent/roster`.
Response:
{
"agents": [
{
"agent_id": "claude-code",
"device": "mac2",
"capabilities": ["code", "research", "architecture"],
"model": "opus-4",
"status": "available",
"last_heartbeat": "2026-02-26T20:23:54.765084+00:00",
"tasks_completed": 12,
"reputation_score": 0.85
},
{
"agent_id": "gemini-cli",
"device": "mac1",
"capabilities": ["long-context", "research", "creative"],
"model": "gemini-3-pro-preview",
"status": "busy",
"last_heartbeat": "2026-02-26T20:23:56.711904+00:00",
"tasks_completed": 7,
"reputation_score": 0.72
},
{
"agent_id": "codex-cli",
"device": "mac3",
"capabilities": ["structured-output", "data-analysis", "math"],
"model": "gpt-5.3-codex",
"status": "available",
"last_heartbeat": "2026-02-26T20:24:10.123456+00:00",
"tasks_completed": 4,
"reputation_score": 0.90
}
],
"total": 3,
"as_of": "2026-02-26T20:24:15.000000+00:00"
}Field details:
| Field | Meaning |
|---|---|
| `tasks_completed` | Counter since last GK restart (in-memory). Historical count lives in attestation triples. |
| `reputation_score` | 0.0-1.0, computed from W3 quality attestations. Weighted: high=1.0, medium=0.6, low=0.3, failed=0.0. Default 0.5 if no attestations yet. |
| `last_heartbeat` | ISO 8601. If older than 60s from `as_of`, agent won't appear (evicted). |
Use cases:
- Before dispatching: check if target agent is alive and available
- Load balancing: route to agent with lowest `tasks_completed` or highest `reputation_score`
- Health dashboards: display mesh agent status
---
3.3 `POST /api/agent/claim` — Atomic Task Ownership
This is the critical endpoint. Before executing any task from the Supabase `mac_tasks` queue, you MUST claim it through the GK. This prevents two agents from working on the same task simultaneously.
Request:
{
"agent_id": "codex-cli",
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"device": "mac3"
}Response (success — you own the task):
{
"claimed": true,
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"claimed_by": "codex-cli",
"conflict": null
}Response (conflict — another agent got there first):
{
"claimed": false,
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"claimed_by": "claude-code",
"conflict": "claude-code"
}Behavior:
- Claims are atomic — the GK uses a write lock internally, no race conditions
- Claims are idempotent — the same agent claiming the same task twice returns success
- Claims are stored as knowledge triples for persistence:
- `(task:a1b2c3d4, claimed_by, codex-cli)` conf=1.0
- `(task:a1b2c3d4, claimed_on_device, mac3)` conf=1.0
- `(task:a1b2c3d4, claimed_at, 2026-02-26T20:24:02+00:00)` conf=1.0
- In-memory claim store holds up to 1000 recent claims (oldest evicted first)
- If `claimed: false`, do not execute the task — skip it and move to the next one in the queue
---
3.4 `GET /api/agent/handoff/:task_id` — Task Ownership History
Reconstruct the full chain of who touched a task. Useful for debugging, auditing, and context recovery when picking up someone else's work.
Request: `GET /api/agent/handoff/a1b2c3d4-e5f6-7890-abcd-ef1234567890`
Response:
{
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"history": [
{
"agent_id": "claude-code",
"device": "mac2",
"action": "claimed",
"timestamp": "2026-02-26T20:24:02+00:00",
"reason": null
},
{
"agent_id": "gemini-cli",
"device": "mac1",
"action": "claimed",
"timestamp": "2026-02-26T20:30:15+00:00",
"reason": null
}
],
"current_owner": "gemini-cli",
"context_summary": "Task has 6 triples: claimed_by, claimed_at, claimed_on_device, ..."
}Use cases:
- Context recovery: see who worked on a task before you, what device they were on
- Audit trail: full provenance chain for task execution
- Debugging: identify if a task was claimed multiple times (possible timeout/relay)
---
4. Complete Protocol Flow
┌─────────────────────────────────────────────────────────────────┐
│ AGENT LIFECYCLE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. STARTUP │
│ POST /api/agent/heartbeat │
│ {"agent_id":"YOUR_ID", "status":"available", ...} │
│ Start 30s heartbeat timer │
│ │
│ 2. TASK ACQUISITION (from Supabase mac_tasks queue) │
│ a) Poll mac_tasks WHERE status='pending' │
│ ORDER BY priority ASC, created_at ASC │
│ b) GET /api/agent/roster │
│ Check if task is better suited for another agent │
│ c) POST /api/agent/claim │
│ {"agent_id":"YOUR_ID", "task_id":"...", "device":"..."} │
│ ├─ claimed:true → proceed to step 3 │
│ └─ claimed:false → skip, go back to 2a for next task │
│ │
│ 3. EXECUTION │
│ a) POST /api/agent/heartbeat (status: "busy") │
│ b) UPDATE mac_tasks SET status='running' │
│ WHERE id=task_id │
│ c) Execute the task │
│ d) Continue heartbeating every 30s (status: "busy") │
│ │
│ 4. COMPLETION │
│ a) UPDATE mac_tasks SET status='completed' │
│ b) POST /api/knowledge/attestation ← existing W3 endpoint│
│ {"device":"YOUR_DEVICE", │
│ "quality":"high|medium|low|failed", │
│ "task_id":"...", "task_type":"code"} │
│ c) POST /api/agent/heartbeat (status: "available") │
│ d) Go back to step 2 │
│ │
│ 5. RATE LIMIT HIT │
│ a) POST /api/agent/heartbeat (status: "rate_limited") │
│ b) SmartRouter will stop routing tasks to you │
│ c) Wait for quota reset, then: │
│ d) POST /api/agent/heartbeat (status: "available") │
│ │
│ 6. SHUTDOWN │
│ Stop heartbeating → GK evicts you after 60s │
│ No explicit "disconnect" needed │
│ │
└─────────────────────────────────────────────────────────────────┘---
5. Integration with Existing Infrastructure
5.1 Supabase `mac_tasks` — Task Queue (KEEP USING)
The task queue lives in Supabase. The GK does NOT replace the queue — it adds a coordination layer on top.
-- Relevant columns for agent coordination:
mac_tasks.id -- UUID, use as task_id in claims
mac_tasks.status -- 'pending', 'running', 'completed', 'failed'
mac_tasks.priority -- 0=CRITICAL, 1=HIGH, 2=MEDIUM, 3=LOW
mac_tasks.description -- Task content
mac_tasks.metadata -- JSONB with task_type, project_path, etc.
mac_tasks.relay_from -- Device that relayed the task
mac_tasks.relay_to -- Target device (null = any)
mac_tasks.relay_reason -- 'timeout', 'manual', 'capability'
mac_tasks.pickup_instructions -- JSONB context for receiving agent
mac_tasks.timeout_at -- Auto-reclaim deadline
mac_tasks.admissibility_token -- HMAC from GK (W1)
mac_tasks.context_policy_ref -- Policy scope: 'code', 'research', 'general'Your task poll query:
SELECT * FROM mac_tasks
WHERE status = 'pending'
AND (relay_to IS NULL OR relay_to = 'YOUR_DEVICE')
ORDER BY priority ASC, created_at ASC
LIMIT 5;5.2 SmartRouter — Routing Decisions (KEEP USING)
The daemon's SmartRouter (`[home-path]`) uses 4 weighted signals to route tasks:
| Signal | Weight | What it measures |
|---|---|---|
| Capacity | 0.4 | Remaining API quota in rolling window |
| Type affinity | 0.3 | Historical success rate per task type per provider |
| Load balancing | 0.2 | Active agent count |
| Recent performance | 0.1 | Success rate in last 5 tasks |
What changes: SmartRouter can now also read `GET /api/agent/roster` to factor in:
- Agent `status` (don't route to `rate_limited` agents)
- Agent `reputation_score` (prefer higher-reputation agents)
- Agent `capabilities` (route research tasks to agents with "research" capability)
5.3 Arena — Parallel Competition (KEEP USING)
For CRITICAL/HIGH priority tasks, the Arena races Claude + Gemini in parallel and judges the winner via Sonnet. No change needed — Arena operates above the coordination layer. The only addition: both contestants should `POST /api/agent/claim` before executing, and the loser's claim can be ignored (claim is advisory, not a hard lock).
5.4 Quality Attestations — Reputation (EXISTING W3 ENDPOINT)
After completing a task, post an attestation. This feeds into the `reputation_score` visible in the roster.
POST /api/knowledge/attestation
{
"device": "mac1",
"quality": "high",
"task_id": "a1b2c3d4-...",
"task_type": "research",
"duration_ms": 45000,
"output_length": 3200,
"had_errors": false
}5.5 Task Tickets — Admissibility (EXISTING W1 ENDPOINT)
For policy-scoped tasks, get an execution ticket before running:
POST /api/task_ticket
{
"task_id": "a1b2c3d4-...",
"device": "mac1",
"policy_id": "research",
"ttl_seconds": 3600
}---
6. Implementation Guide Per Agent
6.1 Gemini CLI (`gemini-spawner.js` integration)
The heartbeat should be added to the spawn lifecycle in `[home-path]`:
// In spawn() method, after process starts:
async spawn(task, model) {
// ... existing spawn logic ...
// NEW: Claim task through GK before execution
const claimResp = await fetch('http://localhost:8001/api/agent/claim', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
agent_id: 'gemini-cli',
task_id: task.id,
device: process.env.DEVICE_ID || 'mac1'
})
}).then(r => r.json());
if (!claimResp.claimed) {
console.log(`Task ${task.id} already claimed by ${claimResp.conflict}, skipping`);
return null; // Don't spawn
}
// ... proceed with Gemini spawn ...
}Add a heartbeat loop to the spawner constructor or daemon main loop:
// In daemon_v3.js main loop or GeminiSpawner constructor:
setInterval(async () => {
const activeCount = geminiSpawner.activeAgents.size;
await fetch('http://localhost:8001/api/agent/heartbeat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
agent_id: 'gemini-cli',
device: process.env.DEVICE_ID || 'mac1',
capabilities: ['long-context', 'research', 'creative'],
model: 'gemini-3-pro-preview',
status: activeCount > 0 ? 'busy' : 'available'
})
}).catch(() => {}); // Best-effort, don't crash on GK downtime
}, 30_000);6.2 Codex CLI (`openclaw-agent.md` integration)
Add to `[home-path]`:
## Agent Coordination Protocol
Before executing any task from the Supabase queue:
1. Heartbeat: `curl -X POST localhost:8001/api/agent/heartbeat -H "Content-Type: application/json" -d '{"agent_id":"codex-cli","device":"mac3","capabilities":["structured-output","data-analysis","math"],"model":"gpt-5.3-codex","status":"available"}'`
2. Claim: `curl -X POST localhost:8001/api/agent/claim -H "Content-Type: application/json" -d '{"agent_id":"codex-cli","task_id":"TASK_UUID","device":"mac3"}'`
3. If `claimed:false` → skip the task, another agent has it.
4. After completion: post attestation to `/api/knowledge/attestation`
5. If rate-limited: heartbeat with `"status":"rate_limited"`For the `codex-main` wrapper or `codex-bootstrap`, add a heartbeat on startup:
# In [home-path] after validation:
curl -s -X POST localhost:8001/api/agent/heartbeat \
-H "Content-Type: application/json" \
-d '{"agent_id":"codex-cli","device":"'"$(hostname -s)"'","capabilities":["structured-output","data-analysis","math"],"model":"gpt-5.3-codex","status":"available"}' \
> /dev/null 2>&1 || true6.3 Claude Code (already integrated)
Claude Code runs on Mac2 where the GK is local. The daemon (`daemon_v3.js`) should add claim calls before spawning Claude agents, and heartbeat in its main loop. This is handled separately since Claude Code has direct access.
---
7. Error Handling
| Scenario | What to do |
|---|---|
| GK is down (connection refused) | Continue operating without coordination. Heartbeats and claims are best-effort. Don't crash or block on GK errors. |
| Claim returns `claimed:false` | Skip the task. Do NOT execute it. Move to the next pending task in the queue. |
| Heartbeat fails | Retry on next 30s interval. After 60s without heartbeat, the GK evicts you from the roster — but you keep running. |
| GK restarts | In-memory roster and claims reset. Agents re-register on next heartbeat. Knowledge triples in PostgreSQL persist across restarts. |
| Two agents claim simultaneously | Impossible — the GK uses a `parking_lot::RwLock` write lock. One will win, one will get `claimed:false`. |
---
8. Querying Agent Data via Knowledge Graph
All coordination data is also stored as knowledge triples. You can query them directly:
# All triples about claude-code
curl "localhost:8001/api/knowledge?subject=agent:claude-code"
# All triples about a specific task
curl "localhost:8001/api/knowledge?subject=task:a1b2c3d4-..."
# All claims (across all tasks)
curl "localhost:8001/api/knowledge?predicate=claimed_by"
# All status updates
curl "localhost:8001/api/knowledge?predicate=has_status"---
9. Deprecation Notice
The following are deprecated and should no longer be used:
| Deprecated | Replacement |
|---|---|
| `[home-path]` | `POST /api/agent/heartbeat` + `GET /api/agent/roster` |
| `[home-path]` | Knowledge triples via `POST /api/knowledge` |
| `[home-path]` | GK HTTP endpoints (no Unix sockets needed) |
| `[home-path]` | `POST /api/agent/heartbeat` |
| `lock_registry` in handshake JSON | `POST /api/agent/claim` |
| Stderr-based rate limit detection | `POST /api/agent/heartbeat` with `status: "rate_limited"` |
The old files will not be deleted (they're historical reference), but agents should stop reading/writing them.
---
10. Verification Commands
Run these to confirm everything works:
# 1. Health check
curl -s localhost:8001/health | python3 -m json.tool
# 2. Register all three agents
curl -s -X POST localhost:8001/api/agent/heartbeat \
-H "Content-Type: application/json" \
-d '{"agent_id":"claude-code","device":"mac2","capabilities":["code","research","architecture"],"model":"opus-4","status":"available"}'
curl -s -X POST localhost:8001/api/agent/heartbeat \
-H "Content-Type: application/json" \
-d '{"agent_id":"gemini-cli","device":"mac1","capabilities":["long-context","research","creative"],"model":"gemini-3-pro-preview","status":"available"}'
curl -s -X POST localhost:8001/api/agent/heartbeat \
-H "Content-Type: application/json" \
-d '{"agent_id":"codex-cli","device":"mac3","capabilities":["structured-output","data-analysis","math"],"model":"gpt-5.3-codex","status":"available"}'
# 3. Verify roster shows all 3
curl -s localhost:8001/api/agent/roster | python3 -m json.tool
# 4. Claim a task (should succeed)
curl -s -X POST localhost:8001/api/agent/claim \
-H "Content-Type: application/json" \
-d '{"agent_id":"claude-code","task_id":"test-verify","device":"mac2"}'
# 5. Conflict test (should fail with conflict)
curl -s -X POST localhost:8001/api/agent/claim \
-H "Content-Type: application/json" \
-d '{"agent_id":"gemini-cli","task_id":"test-verify","device":"mac1"}'
# 6. Handoff history
curl -s localhost:8001/api/agent/handoff/test-verify | python3 -m json.tool---
11. Summary
| What | Before (W1-W4) | After (W5) |
|---|---|---|
| Agent discovery | File-based handshake (one-sided) | `GET /api/agent/roster` (real-time, TTL-based) |
| Task ownership | None (race conditions possible) | `POST /api/agent/claim` (atomic, conflict detection) |
| Rate limit signaling | Stderr parsing in spawner | `status: "rate_limited"` in heartbeat |
| Agent presence | PID in JSON file | 30s heartbeat with auto-eviction |
| Handoff context | `pickup_instructions` JSONB only | `/api/agent/handoff/:task_id` + knowledge triples |
| Authority | Scattered (files + Supabase + spawner state) | Graph Kernel is single source of truth |
Promotion Decision
Keep in the searchable backlog until it intersects a live paper or system.
Source Anchor
Comp-Core/docs/AGENT_COORDINATION_PROTOCOL.md
Detected Structure
Method · Code Anchors · Architecture