Grand Diomande Research · Full HTML Reader

🧠 Cognitive Twin — Training Runbook

| Platform | Model | Cost | Time | Quality | |----------|-------|------|------|---------| | **Mac4 Local** | gemma-3-1b-it (4-bit) | $0 | 5 min | ⭐⭐ (proof of concept) | | **Google Colab Pro** | gemma-3-12b-it (4-bit) | $0 (subscription) | 1-2h | ⭐⭐⭐⭐ | | **Together AI** | Qwen3-Next-80B-A3B | ~$16-20 | 2-4h | ⭐⭐⭐⭐⭐ | | **Together AI** | Qwen3-235B-A22B | ~$100-200 | 8-12h | ⭐⭐⭐⭐⭐⭐ (future) |

Agents That Account for Themselves technical note experiment writeup candidate score 24 .md

Full Public Reader

🧠 Cognitive Twin — Training Runbook

> Repeatable pipeline for fine-tuning the Cognitive Twin on any platform.
> Last updated: 2026-02-18

Overview

Platform	Model	Cost	Time	Quality
Mac4 Local	gemma-3-1b-it (4-bit)	$0	5 min	⭐⭐ (proof of concept)
Google Colab Pro	gemma-3-12b-it (4-bit)	$0 (subscription)	1-2h	⭐⭐⭐⭐
Together AI	Qwen3-Next-80B-A3B	~$16-20	2-4h	⭐⭐⭐⭐⭐
Together AI	Qwen3-235B-A22B	~$100-200	8-12h	⭐⭐⭐⭐⭐⭐ (future)

Prerequisites

Data Preparation (run once, then reuse)

bash

# 1. Merge all SFT data
cd Desktop/Comp-Core/packages/cognitive-twin
python3 scripts/local_finetune.py

# Output: local_finetune/data/train.jsonl, valid.jsonl, test.jsonl
# Stats: ~16K deduplicated records from 41K raw

### Data Location
- Train: `Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/train.jsonl` (192MB, 16,360 records)
- Val: `Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/valid.jsonl` (10MB, 909 records)
- Test: `Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/test.jsonl` (10MB, 909 records)

### Data Format
Standard ChatML JSONL:

json

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

⚠️ Gemma models require strict alternation: `system? (user assistant)+`
The merge script handles this automatically.

---

Path A: Mac4 Local (MLX)

### When to use
- Quick iteration, testing data quality
- Zero cost, ~5 minutes
- Limited to 1B model (16GB RAM constraint)

Steps

bash

# 1. Ensure MLX is installed on Mac4
ssh mac4 'pip3 install mlx mlx-lm'

# 2. Rsync data
rsync -avz Desktop/Comp-Core/packages/cognitive-twin/local_finetune/ mac4:[home-path]

# 3. Train
ssh mac4 'bash -lc "
export PATH=\$HOME/Library/Python/3.9/bin:\$PATH
mlx_lm.lora \
  --model mlx-community/gemma-3-1b-it-4bit \
  --data [home-path] \
  --adapter-path [home-path] \
  --train \
  --batch-size 1 \
  --num-layers 4 \
  --iters 500 \
  --learning-rate 5e-5 \
  --max-seq-length 256 \
  --steps-per-report 10 \
  --grad-checkpoint
"'

# 4. Test inference
ssh mac4 'bash -lc "
export PATH=\$HOME/Library/Python/3.9/bin:\$PATH
mlx_lm.generate \
  --model mlx-community/gemma-3-1b-it-4bit \
  --adapter-path [home-path] \
  --max-tokens 200 \
  --prompt \"What projects are you working on?\"
"'

# 5. Backup adapters
rsync -avz mac4:[home-path] Desktop/Comp-Core/packages/cognitive-twin/local_finetune/adapters/

### Key Parameters (Mac4)
- Model: gemma-3-1b-it-4bit (only model that fits in 16GB with LoRA)
- Seq length: 256 max (higher = OOM)
- LoRA layers: 4 (more = OOM)
- Peak memory: ~1.6 GB
- Speed: ~2 iter/sec

### Known Issues
- gemma-3-4b OOMs even at 1024 seq length
- Validation pass can OOM — set eval steps very high (1000+)
- Sequences >256 tokens get truncated silently

---

Path B: Google Colab Pro

### When to use
- Best free option for quality training
- A100 GPU handles up to 12B models easily
- ~1-2 hours

### Account
- Email: [email]
- Plan: Colab Pro
- GPU: A100 (40GB) preferred, T4/V100 fallback

Steps

1. Go to https://colab.research.google.com
2. Sign in as [email]
3. File → Upload notebook → select notebooks/twin_finetune_colab.ipynb
   OR create new notebook and upload train_twin.py
4. Runtime → Change runtime type → A100 GPU
5. Upload files:
   - train.jsonl (192MB)
   - valid.jsonl (10MB)
   - train_twin.py
6. Run cells in order, or just:
   !pip install -q unsloth trl peft accelerate bitsandbytes
   !python train_twin.py
7. Download twin-lora-adapter.zip when done

### Script auto-detects GPU
- A100 (40GB) → gemma-3-12b-it, seq 2048, batch 2
- V100/T4 (16GB) → gemma-3-4b-it, seq 1024, batch 2
- Anything smaller → gemma-3-1b-it, seq 512, batch 1

### Files
- `notebooks/twin_finetune_colab.ipynb` — Interactive notebook
- `notebooks/train_twin.py` — Standalone script
- `colab-upload/` — Pre-packaged folder with data + script

### HuggingFace Auth (required for Gemma)
You need to accept the Gemma license at https://huggingface.co/google/gemma-3-12b-it
Then login in Colab: `huggingface-cli login`

---

Path C: Together AI

### When to use
- Highest quality (80B+ parameter models)
- Serverless LoRA deployment (instant inference after training)
- ~$16-20 for 80B MoE, ~$200 for 235B

### Account
- API Key: In vault (`[home-path]`)
- Billing: https://api.together.xyz/settings/billing

Steps

bash

export TOGETHER_API_KEY=$(grep TOGETHER [home-path] | cut -d= -f2)

# 1. Upload data (if not already uploaded)
python3 -c "
from together import Together
import httpx
client = Together([sensitive field redacted], timeout=httpx.Timeout(600.0))
train = client.files.upload(file='$HOME/Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/train.jsonl', purpose='fine-tune')
val = client.files.upload(file='$HOME/Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/valid.jsonl', purpose='fine-tune')
print(f'Train: {train.id}')
print(f'Val: {val.id}')
"

# 2. Launch fine-tuning job
curl -X POST https://api.together.xyz/v1/fine-tunes \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-Next-80B-A3B-Instruct",
    "training_file": "<TRAIN_FILE_ID>",
    "validation_file": "<VAL_FILE_ID>",
    "suffix": "twin-alpha-v2",
    "n_epochs": 2,
    "n_evals": 10,
    "n_checkpoints": 1,
    "batch_size": 16,
    "learning_rate": 2e-5,
    "training_type": {
      "type": "Lora",
      "lora_r": 16,
      "lora_alpha": 32
    }
  }'

# 3. Monitor
curl https://api.together.xyz/v1/fine-tunes/<JOB_ID> \
  -H "Authorization: Bearer $TOGETHER_API_KEY" | python3 -m json.tool

# 4. After completion — deploy as serverless LoRA
# The model_output_name from the job response is your endpoint

### Current Files on Together AI
- Train: `file-55fa0beb-e510-48be-87c9-429186266cc5` (202MB, 16,360 records)
- Val: `file-1c315355-576a-4bc4-9507-174f653ed5fd` (10.9MB, 909 records)

### Current Jobs
- `ft-91cf6122-efc5` — Qwen3-Next-80B-A3B-Instruct, twin-alpha-v2 (pending)

### Pricing (LoRA SFT)
| Model Size | $/M tokens |
|-----------|-----------|
| Up to 16B | $0.48 |
| 17-69B | $1.50 |
| 70-100B | $2.90 |
| Qwen3-235B (specialized) | $6.00 |

### Available Models for Fine-Tuning
Best value for Cognitive Twin:
1. Qwen3-Next-80B-A3B-Instruct — 80B total, 3B active MoE. $0.48/Mtok tier (classified as ≤16B due to active params). Serverless inference at $0.15/$1.50 per Mtok.
2. Qwen3-30B-A3B-Instruct-2507 — 30B total, 3B active. Same price tier.
3. Gemma-3-4b-it — Dense 4B. Cheapest training.
4. Qwen3-235B-A22B-Instruct-2507 — The big one. $6/Mtok but highest quality.

---

Post-Training: Evaluation

Quick Test (20 prompts)

python

test_prompts = [
    "What projects are you currently working on?",
    "How would you approach building a new iOS app?",
    "Explain the N'Ko keyboard architecture",
    "What's the status of the BWB POS app?",
    "How does the Dream Garden work?",
    # ... add domain-specific prompts
]

### Twin Fidelity Score (TFS)
Compare Twin output vs ground truth on:
1. Factual accuracy — Does it know Mo's projects?
2. Style match — Does it sound like the training data?
3. Task completion — Can it actually help with real tasks?
4. Hallucination rate — How much does it make up?

Target: TFS ≥ 0.80 (baseline was 0.772 from V1)

---

Maintenance: Retraining Pipeline

### When to retrain
- Every 2-4 weeks (new conversation data accumulates)
- After major project changes
- When TFS drops below 0.75

### Steps
1. Run density scorer on new data: `python3 scripts/density_v4.py`
2. Run V9 generators for new domains: `python3 scripts/gen_v9_*.py`
3. Re-run merge: `python3 scripts/local_finetune.py`
4. Upload new data to Together AI / Colab
5. Launch new training job with `suffix: twin-alpha-v3` (increment)
6. Evaluate and deploy if TFS ≥ 0.80

---

File Inventory

cognitive-twin/
├── scripts/
│   ├── local_finetune.py          # Data merge + MLX prep
│   ├── density_v4.py              # Density scoring
│   └── gen_v9_*.py                # SFT data generators
├── notebooks/
│   ├── twin_finetune_colab.ipynb  # Interactive Colab notebook
│   └── train_twin.py              # Standalone Colab script
├── colab-upload/                   # Pre-packaged for Colab
│   ├── train.jsonl
│   ├── valid.jsonl
│   └── train_twin.py
├── local_finetune/
│   ├── data/                      # Merged, deduplicated data
│   ├── adapters/                  # Mac4 local LoRA adapters
│   └── manifest.json
├── data/
│   ├── ctv3_export_v3/            # Base corpus (41K records)
│   └── expansion_v9/             # V9 generators output
├── MAC4_BENCHMARK.md              # Local model benchmark results
└── TRAINING_RUNBOOK.md            # This file

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/packages/cognitive-twin/TRAINING_RUNBOOK.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture