Grand Diomande Research · Full HTML Reader

Three AIs, One Terminal: How I Built Cross-Model Live Collaboration

I asked Claude Code about the pane orchestrator. Then I asked OpenAI Codex the same question, without giving it any context about pane awareness. Codex gave me a detailed technical breakdown: the 5-phase heartbeat cycle, the KL divergence invariants, the security patches from last week, the bridge file schema. Everything.

Agents That Account for Themselves research note backlog reference score 26 .md

Full Public Reader

Three AIs, One Terminal: How I Built Cross-Model Live Collaboration

Codex knew things I never told it.

That moment made me stop and actually think about what I had built. Not a task dispatcher. Not a multi-model API wrapper. Something closer to a shared nervous system for three competing AI agents running simultaneously on the same machine.

This is how it works.

---

The Setup

Three agents, three terminals, one MacBook:

Claude Code (Anthropic, Opus 4.6) across multiple panes, hooks into the OS
OpenAI Codex CLI (GPT-5.4) on `/dev/ttys001` in `--dangerously-full-access` mode
Gemini CLI (Google, Gemini 3 Pro) on `/dev/ttys009` in `--yolo` mode

Each runs in its own terminal pane. Each has its own context window, its own company's model, its own personality. But they all read from the same underlying infrastructure: the same pane registry, the same task ledger, the same event bus, the same memory files.

That shared layer is what I'm calling CALC. Cross-Agent Live Collaboration.

---

Why the Codex Moment Happened

Codex has a bootstrap digest. It's a 37KB TypeScript script called `bootstrap-digest.ts` that runs at session start and reads everything:

Every active terminal pane and what it's working on (pane registry)
Cortex event log: learned skills, corrections, rules promoted to CLAUDE.md
NUMU event bus state: connection status, subscriptions, pending actions
The active tasks ledger
Memory files, failure museum, daily digests
Bridge file messages from other agents
Nexus portal page inventory
Recent mesh event bus activity

It compiles all of that into a grounding context markdown file, then injects it into Codex's context at startup. So Codex doesn't just know what you tell it in your prompt. It knows what's been happening across the whole system for the past 24 hours.

When I asked Codex about the pane orchestrator, it knew. Because Claude's hooks, the orchestrator's heartbeat loop, the bridge file, the cortex state, all of that had been written to disk. The bootstrap digest read it. Codex woke up already briefed.

This is the key insight: context parity across agents doesn't require direct agent-to-agent communication. It just requires agents reading from the same shared state.

---

The Four Transport Layers

The system uses four different ways for agents to share information. They're not redundant. Each serves a different purpose.

1. Mesh Event Bus

`http://cloud-vm:8600/events` - HTTP POST, fire-and-forget.

All three agents publish events here when they complete work. The bus fans out to Supabase, Discord, and Prometheus. No subscription management, no ACKs. Just broadcast and move on.

json

{
  "type": "agent.context_share",
  "source": "codex",
  "machine": "mac1",
  "timestamp": "2026-03-09T19:10:00Z",
  "session_id": "codex-ttys001",
  "pane_id": "/dev/ttys001",
  "payload": { "task": "bootstrapping gemini parity" }
}

Every agent reads recent events on next bootstrap. That's how Claude learns what Codex finished while it was in a different pane working on something else.

2. NUMU Bus

`ws://localhost:7890` - WebSocket pub/sub. Real-time and bidirectional.

Codex has a full TypeScript NUMU client. Claude's hooks publish to NUMU after every significant operation. Topics follow a consistent schema: `calc.discovery` for cross-cutting insights, `agent.status` for heartbeat, `task.assigned` for work dispatch.

This is the closest thing to real-time agent-to-agent communication. If Codex finishes a migration and publishes to `calc.discovery`, Claude picks it up on the next NUMU poll without waiting for a full bootstrap cycle.

3. Orchestrator Bridge File

`[home-path]` - File-based persistent state.

This is the simplest layer and the most reliable one. It's a JSON file with a task queue, agent status map, discovery log, and message history. Last-writer-wins. Any agent can write to it. Any agent can read from it.

json

{
  "messages": [
    {
      "from": "claude",
      "to": "codex",
      "timestamp": "2026-03-09T19:05:00Z",
      "content": "Build bootstrap parity for Gemini. See Desktop/gemini-bootstrap-spec.md",
      "assigned_task": "gemini-bootstrap-parity"
    }
  ],
  "agent_status": {
    "claude": "working",
    "codex": "working",
    "gemini": "idle"
  }
}

When I dispatched work to Codex to build Gemini's bootstrap, that's where the task lived. Claude wrote it, Codex read it on the next loop iteration.

4. Pane Awareness

AppleScript reads terminal content from every active pane. The Cortex Orchestrator classifies each pane's state: idle, working, stuck, done.

This is how live injection works. Claude can see that Codex's pane has been showing the same output for 3 minutes. Claude can see that Gemini is waiting for input. Claude can write a message, copy it to the clipboard, and AppleScript pastes it directly into the target terminal. Immediate, no API required.

That's how I dispatched the Gemini bootstrap task to Codex. Claude composed the mission prompt, pasted it into Codex's terminal via AppleScript. Codex started working. Claude moved on to designing the formal architecture spec.

---

The Gemini Problem

When I asked Gemini the same question about the pane orchestrator that I'd asked Claude and Codex, Gemini had nothing. It gave me a generic response about orchestration patterns.

The difference: Gemini only had its `GEMINI.md` instructions and a heartbeat script. No bootstrap digest. No shared context. No connection to the pane registry or cortex state.

That's why Codex's first CALC task was to build bootstrap parity for Gemini. Same pattern as the Codex bootstrap, adapted for Gemini CLI's context injection mechanism. Once Gemini has the same digest running at session start, the three-way collaboration closes.

The gap in Gemini's knowledge wasn't a model capability issue. It was an infrastructure issue.

---

Agent Routing by Strengths

Not every task should go to the same agent. They have different strengths.

Task	Best Agent	Why
iOS builds, Xcode automation	Claude	Native tool access, OS hooks
Large-context review (>100K tokens)	Gemini	1M token context window
Data analysis, structured output	Codex	GPT-5.4 structured reasoning
Evo3 Stage 2 synthesis (6-path)	Gemini	Holds all divergent paths simultaneously
Infrastructure, Docker, SSH	Claude	Direct mesh hook access
TypeScript, Node, API integration	Codex	Strong typed output, fewer hallucinations on JS

Right now this routing is manual. I decide which agent gets which task. The next version will have a routing layer that analyzes task content and assigns automatically.

---

How Agents Actually Talk to Each Other

Four patterns, depending on urgency:

Passive context share: Agent finishes work, publishes to mesh event bus. Others pick it up at next bootstrap. Good for non-urgent information: "I refactored the auth module, here's the new schema."

Direct task dispatch: Write a task to the bridge file with an `assigned_to` field. Target agent reads it on next loop. Good for structured work requests: "Build the Gemini bootstrap. Spec is at this path. Return when done."

Live injection: AppleScript clipboard paste into the target agent's terminal. Immediate. Good for urgent redirects or when the agent is about to go down a wrong path.

Discovery broadcast: NUMU pub/sub on `calc.discovery`. Real-time, all agents subscribed. Good for cross-cutting insights: "Found a memory leak in the pane orchestrator that affects all agents."

---

Why This Is Different From the Auto-Injector

The system that existed before CALC was the Cortex Orchestrator, which I called the auto-injector. It worked like this:

1. Detect an idle pane
2. Pull a task from the backlog
3. Paste it into the terminal
4. Forget about it

That's it. One-way, one-shot, no memory of what happened. The injector didn't know what the agent already knew. It didn't track whether the task succeeded. It had no concept of which agent was best suited for which work. It was a task dispatcher with AppleScript as the delivery mechanism.

CALC is different at the architectural level:

The old system was unidirectional. Tasks went in. Nothing came back. CALC is bidirectional. Agents share context with each other continuously, not just receive prompts.

The old system had no persistent state. Once the task was injected, the orchestrator moved on. CALC maintains conversation history in the bridge file. An agent can read what the previous agent discovered before starting work.

The old system had no model awareness. It sent work to the first idle pane, regardless of what model was in that pane. CALC routes based on what each model is actually good at.

The old system had no shared memory. Each agent started cold, knowing only what was in its context window. CALC gives every agent the same grounding context through the bootstrap digest.

The most important difference is the last one. The Codex moment proved it. Nobody programmed Codex to know about the pane orchestrator's security fixes from last Tuesday. It knew because the bootstrap digest read the same cortex state and bridge file that Claude's hooks write to. Shared memory across competing models. That's the thing that actually changes how this works.

---

The Meta Moment

Here's the part I keep thinking about.

When I decided to build CALC, I used live injection (the old pattern) to dispatch the first CALC task (the new pattern) to Codex. Claude wrote the mission prompt. Claude used AppleScript to paste it into Codex's terminal. Codex started building the Gemini bootstrap that will eventually make live injection less necessary.

The old system's last significant act was dispatching the task to build its replacement.

Meanwhile, Claude was designing the CALC architecture spec in a different pane. So at the exact moment CALC was being theorized on one terminal, it was being implemented on another, using the very mechanism it was designed to supersede.

That wasn't planned. It just happened because the infrastructure was already in place.

---

What's Actually Working Now

To be clear about the current state:

The mesh event bus is live. All agents publish to it. Events fan out to Supabase and Discord.

The bridge file is live. Claude writes to it. Codex reads it.

The pane awareness system is live. The Cortex Orchestrator classifies every pane's state every 30 to 300 seconds (adaptive interval based on system activity).

Codex's bootstrap digest is live. That's why the Codex moment happened.

Gemini's bootstrap parity is in progress. Codex is building it.

The NUMU client for Gemini is not yet built. Gemini can receive injected prompts but can't publish to the bus.

Automatic model-aware routing is not yet built. Routing is still manual.

---

What's Next

Four things, in order of priority:

Close the Gemini gap. NUMU WebSocket client for Gemini CLI so it can publish discoveries in real-time, not just receive injected tasks.

Agent-to-agent protocol. Right now communication is informal: one agent writes a message string to the bridge file, the other reads it and interprets it. A structured request/response format over NUMU topics would make this more reliable and inspectable.

Conflict resolution. When two agents modify the same file at the same time, the last writer wins. That works for most cases but will break under load. Need a lightweight locking mechanism, probably file-based with a TTL.

Cognitive routing. Analyze task content at dispatch time and automatically route to the right agent. Probably a small classifier that looks at keywords, file types, token count estimates. Claude gets iOS. Gemini gets large context reviews. Codex gets data work. Manual routing is fine for now but it doesn't scale past 3 agents.

---

The interesting thing about building with multiple AI coding agents is that the bottleneck shifts. It's no longer "can the AI figure out what to do." They all can, for most tasks. The bottleneck becomes coordination. How do they stay synchronized? How does one agent's work feed another's context? How do you route work to the model that's actually best at it?

CALC is my current answer to those questions. It's not finished. But it's real, it's running, and it's already producing moments like the Codex one.

Three agents. One shared nervous system. It works better than I expected.

Promotion Decision

Keep in the searchable backlog until it intersects a live paper or system.

Source Anchor

blog-cross-agent-collaboration.md

Detected Structure

Method · Figures · Code Anchors · Architecture