Grand Diomande Research · Full HTML Reader

Training Your Own Brain for 0: A Cognitive Twin Experiment

There's a peculiar kind of hubris in trying to train an AI model on yourself. Not on books, not on the internet, not on curated datasets — but on the raw, unfiltered mess of 163,000+ conversation turns between you and your AI agents.

Agents That Account for Themselves experiment experiment writeup candidate score 18 .md

Full Public Reader

Training Your Own Brain for $0: A Cognitive Twin Experiment

What happens when you try to clone your mind on a Mac Mini?

---

There's a peculiar kind of hubris in trying to train an AI model on yourself. Not on books, not on the internet, not on curated datasets — but on the raw, unfiltered mess of 163,000+ conversation turns between you and your AI agents.

Yesterday, I tried it. On a $599 Mac Mini. With no cloud credits. And 16GB of RAM to work with.

The results were humbling. And strangely beautiful.

The Benchmarks Tell the Story

Before any training could happen, I needed to find a model that would actually fit. The Mac Mini isn't a data center. It's a puck-shaped computer that normally runs Adobe Photoshop and occasionally complains about thermal throttling.

Four models entered the arena:

Llama 3.2:3B came out swinging at 71 tokens per second. Fast. Eager. But only 70

Qwen3:4B clocked in at 29 tok/s with 60

Gemma3:4B hit the sweet spot: 44 tok/s, 80

Qwen3:30B never even got a chance. At 18GB, it doesn't fit in 16GB RAM. Rejected at the door.

None of them hit all three targets (>50 tok/s, <14GB RAM, >85

The Surgery

Training data prep is where romanticism dies and spreadsheet formulas are born.

41,778 raw conversation records. Duplicates everywhere — the same exchanges synced from multiple channels, repeated across session restarts. After deduplication: 18,178 usable records. Split 90/5/5 for train/validation/test.

Then the Gemma quirk emerged: it demands strict alternating user/assistant turns. No two assistant messages in a row. No assistant-first conversations.

6,668 records had assistant-first patterns.

The fix was beautifully hacky: inject synthetic "Continue." user messages before every orphaned assistant turn. The AI equivalent of shouting "GO ON" before the teacher can speak. Inelegant. Effective.

Even the 4B model couldn't fit with LoRA adapters. The 1B variant barely squeezed in — 4 LoRA layers, 256 max sequence length, batch size of 1.

Training took 5 minutes.

The Result

Train loss: 0.645. Validation loss: 0.516. Decent numbers.

The qualitative result? The twin picked up my communication style. The rhythm. The occasional snark. The way I trail off with ellipses when I'm thinking...

But it hallucinated freely. Asked about projects, it invented projects. Asked about memories, it fabricated memories with confident specificity.

It's like teaching a goldfish to act like a dolphin. It'll try — and you'll see glimpses of something real — but the physics are fundamentally different. A 1B model simply doesn't have enough parameters to store a person.

The Real Play

The local training was always a prototype. A proof-of-concept to validate the pipeline before spending money.

The real training now runs on Together AI. The model: Qwen3-Next-80B-A3B. That's 80 billion parameters with only 3B active at any moment, thanks to Mixture-of-Experts architecture. The magic of having a massive brain where only the relevant neurons fire.

Cost estimate: $16-20 for 42 million training tokens.

The job is submitted. The credits are loaded. The actual cognitive twin is baking somewhere in a data center, learning what it means to think like me.

What I Learned

The gap between "sounds like you" and "thinks like you" is exactly 79 billion parameters wide.

The 1B model captured my style. My sentence structures, my word choices, the shape of how I communicate. But style isn't identity. Identity requires the capacity to hold context, to remember, to reason about accumulated experience. That takes scale.

Local prototyping is never wasted. Every hour on the Mac Mini was an hour not spent debugging pipelines in the cloud at $0.10/minute. The data prep, format conversion, evaluation framework — all validated on $0 hardware before a cent went to cloud compute.

Hacky solutions work. Those synthetic "Continue." messages are objectively ugly. They're also objectively functional. In machine learning, ugly-and-working beats elegant-and-theoretical every time.

The Deeper Question

What would it mean to have an AI that actually thinks like me?

Not mimics my style. Thinks like me. Approaches problems the way I do. Has opinions that are genuinely mine. Remembers what I've learned, what I've struggled with, what I've concluded.

Would that be a tool? A collaborator? A backup of consciousness? Something more unsettling?

I don't have answers yet. But somewhere in the cloud, 80 billion parameters are working on it.

---

This is part of my ongoing series about building AI systems in public. If you want to follow the cognitive twin experiment, subscribe for updates.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

content-pipeline/substack/2026-02-19-chronicle-3-cognitive-twin.md

Detected Structure

Method · Evaluation · Architecture