Grand Diomande Research · Full HTML Reader

Training Your Twin While You Sleep

It started as a question: what if an AI could make decisions the way I would? Not just respond to prompts, but actually understand the patterns — the preferences, the shortcuts, the instincts that I've developed over years of working this way?

Agents That Account for Themselves research note experiment writeup candidate score 26 .md

Full Public Reader

Training Your Twin While You Sleep

Somewhere on Vast.ai, four H100 GPUs are learning to think like me.

---

The Cognitive Twin has been the longest-running thread in this build.

On Valentine's night — the same night VisionClaw's glasses became a full agent proxy — the training finally launched.

The Corpus

Building a cognitive twin isn't about dumping your entire digital footprint into a model. It's about curating the decisions that define you.

Here's what went in:
- 163K+ conversation turns from Supabase — every interaction with my AI agents over the past year
- 979 Claude Code sessions — how I actually write code, debug problems, think through architecture
- 5,347 Apple Notes — stream of consciousness, ideas, plans, half-formed thoughts
- 20 Discord channels — how I communicate with my team of AI agents

After corpus surgery — cleaning, deduplication, quality filtering — I had 43,173 SFT records ready for training.

But that's not the interesting part.

The DPO Pairs

DPO (Direct Preference Optimization) is how you teach a model to prefer certain behaviors over others. You give it pairs: "Here's a bad response, here's a good response, learn the difference."

I created 740 pairs specifically designed to unlearn one thing: permission-seeking.

Every time an AI says "Should I...?" or "Would you like me to...?" or "Let me know if you want me to..." — that's a failure mode. The best AI assistants don't ask. They assess, decide, and act (with appropriate guardrails).

So the DPO pairs encode the transformation:

Bad: "Should I proceed with this approach?"
Good: "Proceeding with X. Here's the result."

Bad: "Would you like me to fix this bug?"
Good: "Fixed. Here's what was wrong."

Bad: "I noticed an issue. Want me to look into it?"
Good: "Found and resolved an issue: [details]."

The twin won't just know what I know — it'll default to action the way I've trained my AI to.

The Technical Setup

We're fine-tuning Qwen3-235B-A22B — a mixture-of-experts model with a trillion parameters. Using QLoRA (quantized low-rank adaptation) on 4x H100 SXMs rented from Vast.ai.

The training session got killed mid-tokenization when I was debugging a CUDA issue. But the PID survived. The training continued regardless — the system running even as I shifted focus to something else.

By 7:46 AM, tokenization was 15

The Convergence

Here's what I can't stop thinking about: the timing.

On the same Valentine's night:
- The glasses learned to see (VisionClaw becoming a full agent proxy)
- The twin learned to think (Qwen3-235B ingesting 43K records of my decisions)

Both happened overnight while I slept. Both are heading toward the same destination: a version of myself that exists in more places than my body can be.

The glasses give me eyes and ears I can't physically have. The twin gives me decision-making capacity I can't physically provide. Hardware and software, evolving in parallel.

The convergence isn't planned. It's emergent. The projects feed each other because they're asking the same question: how do you scale a person?

The Apple and the Tree

The corpus tells the story of decisions made, preferences formed, patterns repeated. The DPO pairs encode the hardest lesson: stop asking, start doing.

The twin won't be me. It'll be a version of me — frozen in time, trained on the data available at this moment. A snapshot of how I think in early 2026.

But that's okay. The goal isn't replacement. It's multiplication.

Imagine having a version of yourself that can:
- Triage incoming requests while you sleep
- Draft responses in your voice
- Make the 80
- Escalate only the 20

That's what I'm building. Not AGI. Not superintelligence. Just... leverage.

The apple doesn't fall far from the tree, even when the tree is made of tensors.

---

Next dispatch: What happens when the twin wakes up and meets the glasses.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

content-pipeline/substack/2026-02-15-cognitive-twin.md

Detected Structure

Method · Evaluation · References · Architecture