Grand Diomande Research · Full HTML Reader

Track 3 Research: State of the Art for "Operate Everything From the Phone, Live a Life of Leisure"

> Date: 2026-05-13 > Author: Research Engine (Claw subagent) > Goal: Inform the goal-prompt that conditions Claude Code's new goal-conditioning skill toward Mohamed's North Star of running his entire mesh and product surface from the iPhone with no laptop required.

Embodied Trajectory Systems proposal experiment writeup candidate score 26 .md

Full Public Reader

Track 3 Research: State of the Art for "Operate Everything From the Phone, Live a Life of Leisure"

> Date: 2026-05-13
> Author: Research Engine (Claw subagent)
> Goal: Inform the goal-prompt that conditions Claude Code's new goal-conditioning skill toward Mohamed's North Star of running his entire mesh and product surface from the iPhone with no laptop required.

0. TL;DR

The "phone as the operating console for everything" thesis is no longer speculative in 2026. The infrastructure shipped. What separates the few people actually living it from the many who tried and failed is not technology, it is surface area discipline. Every successful operator runs a deliberately small product list, a deliberately small daily decision count, and a deliberately small set of agents that are trusted to act without supervision. Every failed ambient device (Humane, Rabbit, Friend) tried to invent a new substrate when the substrate that already won is "the phone you already have, talking to compute you already own, over a tailnet you already trust." Mohamed's Pebble + aura-gateway + cross-machine inject stack is closer to the winning architecture than any consumer device shipped in the last two years.

1. Prosumer Mesh Control Surfaces (the "remote into my compute from my phone" stack)

1.1 What the SOTA actually looks like in May 2026

The defining shift this year is that Anthropic shipped Claude Code Remote Control (Feb 25, 2026, research preview) and brought Claude Code natively into the iOS app. The promise: keep the heavy session running on your laptop, attach from the phone in five seconds. Nothing moves to the cloud beyond a relay. The pattern is significant because it is the first major agent vendor to publicly admit that the phone is the client, not the host.

Around that announcement a cluster of third-party clients matured:

Happy (`github.com/slopus/happy`): mobile + web client for both Codex and Claude Code, with realtime voice, e2e encryption, and a "kanban of running agent sessions" surface.
Tactic Remote (clauderc.com): pitched as "mobile control layer for Claude Code and Codex." Native voice command.
Nimbalyst: kanban mirror of desktop agent state on iOS.
AgentsRoom: remote-agent monitoring with notification escalation.

These all converge on the same primitives: persistent local agent, mobile attachable view, push notification when human input is needed, voice as the dominant input. They differ on encryption model and on whether voice is local (Whisper-on-device) or relayed.

1.2 The shell-on-the-phone primitives that already won

Below the agent layer there is a stable, boring stack that prosumers have used for years and that Mohamed's Pebble app implicitly extends:

Tool	Role	What makes it durable
Tailscale iOS	Identity + transport	Magic DNS, ACL, exit nodes, key rotation. Boring in a good way.
Blink Shell	SSH + Mosh client with hardware keyboard support and persistent sessions over mobile network handovers	Mosh's UDP-based session survival is the unsung hero.
Termius	Cross-device SSH with key sync, agent forwarding, snippets	Snippets become "phone-shaped commands."
ssh.app	Minimal SSH client	Single-purpose.
Mosh	UDP-based mobile shell with local echo and roaming	Survives airplane mode, subway tunnels, network handover.
Tailscale Funnel / Serve	Public ingress without port forwarding	Removes one entire class of router pain.
ntfy.sh, Pushover, APNs	Notification routing	Out-of-band human-in-the-loop signal.
iOS Shortcuts + App Intents	Deep-link contracts	The "URL scheme" of the personal OS.
Widgets (iOS 17+ interactive widgets)	Glanceable state + one-tap action without app launch	Mohamed already shipped this in Pebble V0.8 P5 Wave 2.
Share Sheet	Universal input surface	Any text, link, or file becomes a prompt.

1.3 Productivity tools that crossed the mobile-parity threshold

Linear mobile: Cycles, triage, comments, mentions all work natively. Linear is the gold standard for "ticket actions from phone." Issue creation via Shortcut + email-to-ticket is widely used.
Notion mobile: Improved 2025 onward, but databases remain second-class. Most prosumers use it as read-mostly on phone.
Things 3: Calendar-as-life, quick entry via Siri / Shortcut, today view. Survives because it does not try to be more than a list.
Reflect / Bear / Obsidian Mobile: capture-first PKM, with sync delegated to iCloud / Dropbox / git. Cursor mobile (2026) is now in TestFlight for code review, not editing.
Notion Calendar (ex-Cron): meeting acceptance and reschedule from phone, with calendar holds.
Raycast iOS: command palette on phone, mostly a snippet + AI front end.

1.4 The honest gap

None of the above gives you "natural-language address my mesh" without writing the glue. Mohamed wrote it. Pebble + aura-gateway `/inject` with `machine` and `tmux_target` is exactly the missing piece between Tailscale and Claude Code Remote Control. The closest commercial peer is Happy, but Happy speaks only to the agent runtime, not to the host machine's tmux, audio interface, or capture daemons.

2. Ambient and Autonomous Personal Assistants (the failure museum)

2.1 The graveyard

Humane AI Pin (2023 launch, discontinued 2025, IP sold to HP for $116M against $230M raised). $699 device, $24/month sub. Every query roundtripped through Humane's servers. Slow. Hot. Burned through batteries. Could not reliably set a timer. Returns exceeded sales by mid 2024. The product never crossed the latency threshold where ambient compute feels faster than just pulling out a phone.
Rabbit R1: shipped 2024 to viral demo, then collapsed under "large action model" not actually being trained. Became an Android app in a plastic shell.
Friend.com pendant: launched 2024, $99, always-on listening companion. Mixed reception; no clear post-mortem yet but adoption is thin.
Tab (Avi Schiffmann): pivoted multiple times, became Friend.
Limitless Pendant: was the best-reviewed of the wearables. Acquired by Meta in December 2025, hardware sales halted, EU/UK/Brazil cut off. Even the winner did not survive as a standalone product.
Open Interpreter "01": the open hardware lost momentum once Claude Code Remote Control shipped.

2.2 The failure pattern (one sentence per failure mode)

1. New hardware before new use case: every failed device tried to replace the phone instead of augmenting it. The phone is not the bottleneck. Friction in the loop between phone and compute is the bottleneck.
2. Round-trip latency is fatal: anything that needs to go to a server before responding loses to the phone's local action. Humane's lag is the canonical example.
3. Battery as the silent killer: 100-hour spec, 12-hour reality (Limitless). When a device is unreliable on power, it gets left at home, which is the death sentence.
4. Social cost of wearables: visible always-on capture is a social tax the wearer pays. Limitless reviews kept surfacing this. The phone in the pocket is socially invisible.
5. Data sludge: continuous ambient capture without an aggressive summariser produces unsearchable garbage. The promise of "recall any conversation" only works with an oracle layer good enough to retrieve. Without it you get a 12-hour audio file.
6. Walled to one ecosystem: the AI Pin had no way to read your Linear, your Tailscale, your tmux. A standalone device cannot plug into a sovereign builder's stack.

2.3 What worked, in the same period

AirPods + Siri Shortcut to Claude / ChatGPT: voice in, voice out, no new hardware. The killer move was AirPods Pro 2 "Press and hold" → custom Shortcut. This is the ambient assistant that shipped.
Apple Intelligence on-device (iOS 18, 2024 onward): notification summary, writing tools, photo cleanup. Boring, but it actually runs locally and never asks for a subscription.
Whisper-on-iPhone via Aiko, Whisper Memos, MacWhisper iOS: local dictation that beats Apple Dictation. The leverage is that the model runs on the device the user already owns.
Otter, Granola, Plaud Notepin (the wired and the small): structured meeting capture where the user controls "when to listen" beats always-on.

2.4 Implication for Mohamed

Do not buy a pendant. Do not build a pendant. The ambient computing thesis is correct, the form factor is not. The right shape is the phone as the always-with-you client, a Watch as the gesture surface, AirPods as the audio I/O, and the mesh as the compute. Mohamed already has this shape. The only missing pieces are voice-in (Pebble V0.2 priority) and a Watch complication.

3. The "CEO From Phone" Archetype (operating pattern of solo operators)

3.1 The exemplars

Pieter Levels (Nomad List, Remote OK, PhotoAI). Estimated $3M+ ARR. Solo. No employees, no investors, no meetings. Vanilla PHP, jQuery, single-file repos. Lives out of a backpack. Operates from cafes and plane Wi-Fi. The operating pattern is radical simplicity in the tech, radical concentration in the time.
Marc Lou: 20+ products, ShipFast template, $100K+ MRR. Single template, ships a product a month. Operates with checklists, not architecture. Phone-first launch checklist.
Naval Ravikant philosophy: leverage is code, capital, and audience. Code and audience scale without management overhead. The whole thesis is a permission slip to never have employees.
Daniel Vassallo: post-AWS solo, sells courses + small products, "many small bets" portfolio. Refuses any structure that requires a team.
Tony Dinh (DevUtils, Black Magic, BuyMeACoffee Pro): macOS solo dev, phone for ops, mac for builds.

3.2 The shared operating pattern

Across this group the pattern is the same with small variations:

1. One product surface or a tight cluster of related products — never branch out enough that context-switching cost is high.
2. One revenue channel that is observable from a phone — Stripe dashboard, Lemon Squeezy, Gumroad. The morning check is a number on a notification.
3. One support channel that batches — email or X DMs, replied in two daily batches, not interrupt-driven.
4. One operator-facing console that fits on the phone — admin dashboards built specifically for phone-width.
5. Aggressive deferral of "should I hire" until it is structurally impossible to avoid — and most never reach that point.
6. A laptop exists, but most days do not require it — Pieter is the strongest case here. He still uses a laptop for big features, but daily ops, support, and most edits happen on phone or quick laptop sessions in cafes.

3.3 What this archetype is not

It is not "I run everything from my phone with zero compute behind me." It is "I have a small amount of well-shaped compute behind me, and the phone is the keyboard." The leverage is the asymmetry: a mesh of always-on machines doing the work, a single human dispatching them.

Mohamed's stack already mirrors this. 5 Macs, a K11 Windows machine, a sub-$3K/month Tailscale net, agents on every machine. The architecture is correct. The leisure question is "how do I make the dispatching truly two-finger."

3.4 The Naval frame

Naval's most useful contribution is the line "you do not get rich by selling your time, you get rich by owning equity in something that scales without you." Read that as a daily routing rule. Any task you do should either be: (a) building the thing that scales, (b) maintaining the asymmetric leverage of an existing scaling thing, or (c) leisure. There is no fourth category that survives this filter, including most meetings and most management.

The corollary that matters for a goal prompt: the agent should refuse to put Mohamed on the path of a sub-leverage task. If a task is grindable, agent-able, or sleepable-on, it should not consume an iPhone session.

4. Life of Leisure in 2026 for a Sovereign Software Builder

4.1 What it actually looks like (not the hype version)

Strip out the digital-nomad photography and the Bali pool and you get:

Morning glance: one notification, one number, one widget. Stripe MRR, agent health, overnight failures. Five seconds.
One block of deep work, 2-4 hours, on the highest-leverage thing. This usually requires a laptop. The day is structured around protecting this block, not around filling it.
Two or three tactical phone sessions: ship a fix, reply to support, approve a deploy, kick off an agent on the mesh. Each session is under five minutes.
Evening review: pulse on what shipped, what is queued, what tomorrow's deep block is. Two minutes.
The rest is leisure: training, cooking, reading, friends, walking. Leisure is a primary input to creative work, not a reward.

This is not "agents running 24/7 making me passive income while I sip a coconut." That is a marketing story. The truth is closer to: a small group of trusted agents handle the tail (alerts, summaries, deploys, routine fixes), and the human handles the head (decisions, new product surfaces, the next bet).

4.2 The agentic loop that makes leisure work

Mohamed already has the architectural primitives. Mapping them to the leisure pattern:

Leisure requirement	Primitive
Trust that the mesh did not die overnight	aura-gateway + meshd + Numu pulse
Ability to act on the mesh from anywhere	Pebble + cross-machine `/inject`
Knowledge of what shipped while away	KARL trajectory log + daily memory files
Voice as input	Pebble V0.2 (in progress) + AirPods + Shortcut
Glanceable state	Pebble widget V0.8 P5 Wave 2 (shipped)
Notifications when human input is needed	APNs from gateway, ntfy as fallback
Sovereign over data	Local memory files, Supabase he owns, Tailscale he controls
EW invariants	Already shipped: no-absorbing-states, falls back to local heuristic if backend fails

The EW invariants in Pebble are quietly the most important property in this whole document. A leisure system must degrade gracefully. If the captain bridge is down, Pebble still answers locally. If the widget can't read the projection, it shows a placeholder. The system never enters a state where the human has to drop out of leisure to fix it.

4.3 Cognitive Twin and KARL as load-bearing infrastructure

Mohamed's Cognitive Twin and the KARL trajectory log are the difference between "I run my mesh from my phone" and "my mesh runs itself and I check in." The Twin holds context across sessions so phone-side prompts can be short. KARL captures what the human did and what worked so the agent can imitate the patterns. Both are required for the leisure state to be stable rather than performed.

The honest assessment: KARL has 410 trajectories as of March, 74 SFT examples. That is enough to start step-level imitation but not full-trajectory autonomy. The next bottleneck is collecting trajectory examples of "Mohamed using Pebble to do an op end-to-end," not more general code data.

5. Concrete Primitives That Work (the toolbox)

5.1 Voice-first dictation

Local-first: Whisper-on-device via WhisperKit (Argmax) or whisper.cpp Core ML.
Hybrid: Aiko, MacWhisper iOS, Whisper Memos.
Apple Dictation upgraded with Apple Intelligence is usable as fallback.
The dominant pattern is press-and-hold AirPods → Whisper transcribe → drop into the active app via share sheet.

For Pebble specifically: voice-to-prompt with a single tap, Whisper local, no roundtrip. This is the V0.2 priority and it is the right call.

5.2 Ambient capture without sludge

Granola, Otter, Plaud Notepin model: explicit "I'm in a meeting now" toggle.
iOS Voice Memos + Apple Intelligence summary: shipped, free, on-device summary, no subscription, no pendant.
Avoid always-on capture without an aggressive summarisation pipeline that you control. The Limitless lesson.

5.3 Notification routing

APNs through your own push token: the right answer for an app you control.
ntfy.sh: dead simple, self-hostable, perfect for "ping me when this agent finishes." Free tier is plenty.
Pushover: pricier but rock-solid, has retries, sounds, priority levels.
iOS Focus Modes: route different agents to different focus filters.

5.4 Deep-link contracts

Shortcuts are the universal glue. Anything addressable by URL scheme can be a Shortcut step.
App Intents (iOS 16+) make your app's verbs available to Siri, Shortcuts, widgets, Spotlight, and the Action Button.
Mohamed's Pebble has `pebble://voice`, `pebble://chat/<id>`, `pebble://chat/<id>/voice` already. This is the right grammar. Extending it to `pebble://machine/<m>/inject?prompt=...` is a natural V0.9 move.

5.5 Widgets as glanceable state

Static medium widget for mesh health: green/orange/red per machine + one number per agent.
Interactive widget for "send the most recent prompt to the most recent agent."
Stack widgets so the phone home screen is a dashboard.
Mohamed's V0.8 P5 Wave 2 widget is exactly right. The next move is a Live Activity for "agent running, here is its current step."

5.6 Share sheet as universal input

Underrated. Any text from any app becomes a prompt with two taps. Build a Pebble share extension and the rest of iOS becomes input to Pebble's mesh. V0.8 P5 Wave 5 marks this as optional polish, it is actually a major leverage move.

5.7 Watch as gesture surface

Not a notification-mirror. A dedicated three-action complication: dictate to default agent, ack the last alert, show mesh health. Anything more than three actions is a phone. Watch is for the case where pulling out a phone is socially expensive.

6. Skeptical Counter-Notes

Phone-only is not always faster. A 30-second tmux session on a laptop beats a 5-minute Pebble session for genuinely complex ops. The goal is not "no laptop", the goal is "no laptop for the daily tail." Reserve the laptop for the deep block.
Agents that act without confirmation are the single largest risk. The right default is "agent proposes, human single-taps to confirm" via push notification with action buttons. Reserve full autonomy for irreversible-actions-only-in-a-narrow-domain like deploy-this-branch-to-staging.
Voice everywhere is socially expensive. Subway, cafe, in front of friends, voice loses to typing. A truly leisure-shaped system supports both equally.
The "5 Macs running 24/7" model has a power and reliability tail. Mac4 already hit 2.5GB free last week. The leisure system depends on each mesh node being boringly healthy. This is non-trivial ops and the agent should own it (Numu / pulse / disk monitors should be agentic, not human-run).
Vendor lock-in is the dual of leverage. If Claude Code Remote Control is the daily driver and Anthropic changes terms, the whole stack moves. Mitigation: Pebble already routes to Codex.app and Numu in addition to Claude, the strategy ladder is already polyglot. Keep it that way.

7. Synthesis: What the Goal Prompt Should Encode

Pulling the four threads together, a goal prompt for Claude Code aiming at Mohamed's North Star should bias the agent toward:

1. Phone-shaped outputs by default. Short responses, glanceable structure, link-friendly, share-sheet-friendly.
2. Asynchronous over synchronous. Long-running work goes to the mesh, returns via notification. Never block the phone.
3. Surface-area discipline. Refuse to spawn new tools, new dashboards, new daemons unless they collapse two existing ones.
4. EW invariants on every new feature. No absorbing states. Local fallback when remote is down. Graceful degradation by default.
5. Capture as a side-effect of work. Every Pebble session writes to KARL automatically. No "now go log this" extra step.
6. Leisure-aware scheduling. Heavy operations get queued for the deep block. Tail operations get done now. The agent should know the difference.
7. Sovereignty checks. Prefer local models, local memory, local sync. Treat third-party vendors as fungible.
8. Anti-laziness on its own behalf. When the agent could do a thing rather than ask Mohamed, it should do the thing. Discovery before asking, per the global CLAUDE.md.

Sources

[Claude Code Remote Control announcement, Anthropic docs](https://code.claude.com/docs/en/remote-control)
[Simon Willison on Claude Code Remote Control, Feb 25 2026](https://simonwillison.net/2026/Feb/25/claude-code-remote-control/)
[Best Mobile Apps for Claude Code in 2026, Nimbalyst](https://nimbalyst.com/blog/best-mobile-apps-for-claude-code-2026/)
[Happy, mobile + web client for Codex and Claude Code](https://github.com/slopus/happy)
[Tactic Remote](https://www.clauderc.com/)
[Humane AI Pin Failure analysis, TechResearchOnline](https://techresearchonline.com/blog/humane-ai-pin-failure/)
[Humane's AI Pin was never going to work, Fast Company](https://www.fastcompany.com/91107889/humane-ambient-computing-vision-dead-end)
[Wearable AI Wars 2026, UMEVO](https://www.umevo.ai/blogs/ume-all-posts/wearable-ai-wars-2026-limitless-pendant-vs-bee-pioneer-vs-plaud-notepin)
[Limitless Pendant alternatives, Omi AI](https://www.omi.me/blogs/ai-note-takers/limitless-pendant-alternatives)
[Pieter Levels, The Indie Hacker's Guide to AI Startups](https://thebootstrappedfounder.com/pieter-levels-the-indie-hackers-guide-to-ai-startups/)
[Pieter Levels Net Worth and Operating Style 2026](https://unnetworth.com/pieter-levels-net-worth/)
Mohamed's own memory: Pebble V0.8 P5 (Wave 2 widget, Wave 3 deep links, Wave 4 iPad), aura-gateway `/inject`, cross-machine routing, KARL trajectories, EW no-absorbing-states invariants.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

leisure-goal-synthesis/03-research.md

Detected Structure

Method · Evaluation · Architecture