Grand Diomande Research ยท Full HTML Reader

๐Ÿง  Cognitive Twin โ€” Training Runbook

| Platform | Model | Cost | Time | Quality | |----------|-------|------|------|---------| | **Mac4 Local** | gemma-3-1b-it (4-bit) | $0 | 5 min | โญโญ (proof of concept) | | **Google Colab Pro** | gemma-3-12b-it (4-bit) | $0 (subscription) | 1-2h | โญโญโญโญ | | **Together AI** | Qwen3-Next-80B-A3B | ~$16-20 | 2-4h | โญโญโญโญโญ | | **Together AI** | Qwen3-235B-A22B | ~$100-200 | 8-12h | โญโญโญโญโญโญ (future) |

Agents That Account for Themselves technical note experiment writeup candidate score 24 .md

Full Public Reader

๐Ÿง  Cognitive Twin โ€” Training Runbook

> Repeatable pipeline for fine-tuning the Cognitive Twin on any platform.
> Last updated: 2026-02-18

Overview

PlatformModelCostTimeQuality
Mac4 Localgemma-3-1b-it (4-bit)$05 minโญโญ (proof of concept)
Google Colab Progemma-3-12b-it (4-bit)$0 (subscription)1-2hโญโญโญโญ
Together AIQwen3-Next-80B-A3B~$16-202-4hโญโญโญโญโญ
Together AIQwen3-235B-A22B~$100-2008-12hโญโญโญโญโญโญ (future)

Prerequisites

Data Preparation (run once, then reuse)

bash
# 1. Merge all SFT data
cd Desktop/Comp-Core/packages/cognitive-twin
python3 scripts/local_finetune.py

# Output: local_finetune/data/train.jsonl, valid.jsonl, test.jsonl
# Stats: ~16K deduplicated records from 41K raw

### Data Location
- Train: `Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/train.jsonl` (192MB, 16,360 records)
- Val: `Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/valid.jsonl` (10MB, 909 records)
- Test: `Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/test.jsonl` (10MB, 909 records)

### Data Format
Standard ChatML JSONL:

json
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

โš ๏ธ Gemma models require strict alternation: `system? (user assistant)+`
The merge script handles this automatically.

---

Path A: Mac4 Local (MLX)

### When to use
- Quick iteration, testing data quality
- Zero cost, ~5 minutes
- Limited to 1B model (16GB RAM constraint)

Steps

bash
# 1. Ensure MLX is installed on Mac4
ssh mac4 'pip3 install mlx mlx-lm'

# 2. Rsync data
rsync -avz Desktop/Comp-Core/packages/cognitive-twin/local_finetune/ mac4:[home-path]

# 3. Train
ssh mac4 'bash -lc "
export PATH=\$HOME/Library/Python/3.9/bin:\$PATH
mlx_lm.lora \
  --model mlx-community/gemma-3-1b-it-4bit \
  --data [home-path] \
  --adapter-path [home-path] \
  --train \
  --batch-size 1 \
  --num-layers 4 \
  --iters 500 \
  --learning-rate 5e-5 \
  --max-seq-length 256 \
  --steps-per-report 10 \
  --grad-checkpoint
"'

# 4. Test inference
ssh mac4 'bash -lc "
export PATH=\$HOME/Library/Python/3.9/bin:\$PATH
mlx_lm.generate \
  --model mlx-community/gemma-3-1b-it-4bit \
  --adapter-path [home-path] \
  --max-tokens 200 \
  --prompt \"What projects are you working on?\"
"'

# 5. Backup adapters
rsync -avz mac4:[home-path] Desktop/Comp-Core/packages/cognitive-twin/local_finetune/adapters/

### Key Parameters (Mac4)
- Model: gemma-3-1b-it-4bit (only model that fits in 16GB with LoRA)
- Seq length: 256 max (higher = OOM)
- LoRA layers: 4 (more = OOM)
- Peak memory: ~1.6 GB
- Speed: ~2 iter/sec

### Known Issues
- gemma-3-4b OOMs even at 1024 seq length
- Validation pass can OOM โ€” set eval steps very high (1000+)
- Sequences >256 tokens get truncated silently

---

Path B: Google Colab Pro

### When to use
- Best free option for quality training
- A100 GPU handles up to 12B models easily
- ~1-2 hours

### Account
- Email: [email]
- Plan: Colab Pro
- GPU: A100 (40GB) preferred, T4/V100 fallback

Steps

1. Go to https://colab.research.google.com
2. Sign in as [email]
3. File โ†’ Upload notebook โ†’ select notebooks/twin_finetune_colab.ipynb
   OR create new notebook and upload train_twin.py
4. Runtime โ†’ Change runtime type โ†’ A100 GPU
5. Upload files:
   - train.jsonl (192MB)
   - valid.jsonl (10MB)
   - train_twin.py
6. Run cells in order, or just:
   !pip install -q unsloth trl peft accelerate bitsandbytes
   !python train_twin.py
7. Download twin-lora-adapter.zip when done

### Script auto-detects GPU
- A100 (40GB) โ†’ gemma-3-12b-it, seq 2048, batch 2
- V100/T4 (16GB) โ†’ gemma-3-4b-it, seq 1024, batch 2
- Anything smaller โ†’ gemma-3-1b-it, seq 512, batch 1

### Files
- `notebooks/twin_finetune_colab.ipynb` โ€” Interactive notebook
- `notebooks/train_twin.py` โ€” Standalone script
- `colab-upload/` โ€” Pre-packaged folder with data + script

### HuggingFace Auth (required for Gemma)
You need to accept the Gemma license at https://huggingface.co/google/gemma-3-12b-it
Then login in Colab: `huggingface-cli login`

---

Path C: Together AI

### When to use
- Highest quality (80B+ parameter models)
- Serverless LoRA deployment (instant inference after training)
- ~$16-20 for 80B MoE, ~$200 for 235B

### Account
- API Key: In vault (`[home-path]`)
- Billing: https://api.together.xyz/settings/billing

Steps

bash
export TOGETHER_API_KEY=$(grep TOGETHER [home-path] | cut -d= -f2)

# 1. Upload data (if not already uploaded)
python3 -c "
from together import Together
import httpx
client = Together([sensitive field redacted], timeout=httpx.Timeout(600.0))
train = client.files.upload(file='$HOME/Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/train.jsonl', purpose='fine-tune')
val = client.files.upload(file='$HOME/Desktop/Comp-Core/packages/cognitive-twin/local_finetune/data/valid.jsonl', purpose='fine-tune')
print(f'Train: {train.id}')
print(f'Val: {val.id}')
"

# 2. Launch fine-tuning job
curl -X POST https://api.together.xyz/v1/fine-tunes \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-Next-80B-A3B-Instruct",
    "training_file": "<TRAIN_FILE_ID>",
    "validation_file": "<VAL_FILE_ID>",
    "suffix": "twin-alpha-v2",
    "n_epochs": 2,
    "n_evals": 10,
    "n_checkpoints": 1,
    "batch_size": 16,
    "learning_rate": 2e-5,
    "training_type": {
      "type": "Lora",
      "lora_r": 16,
      "lora_alpha": 32
    }
  }'

# 3. Monitor
curl https://api.together.xyz/v1/fine-tunes/<JOB_ID> \
  -H "Authorization: Bearer $TOGETHER_API_KEY" | python3 -m json.tool

# 4. After completion โ€” deploy as serverless LoRA
# The model_output_name from the job response is your endpoint

### Current Files on Together AI
- Train: `file-55fa0beb-e510-48be-87c9-429186266cc5` (202MB, 16,360 records)
- Val: `file-1c315355-576a-4bc4-9507-174f653ed5fd` (10.9MB, 909 records)

### Current Jobs
- `ft-91cf6122-efc5` โ€” Qwen3-Next-80B-A3B-Instruct, twin-alpha-v2 (pending)

### Pricing (LoRA SFT)
| Model Size | $/M tokens |
|-----------|-----------|
| Up to 16B | $0.48 |
| 17-69B | $1.50 |
| 70-100B | $2.90 |
| Qwen3-235B (specialized) | $6.00 |

### Available Models for Fine-Tuning
Best value for Cognitive Twin:
1. Qwen3-Next-80B-A3B-Instruct โ€” 80B total, 3B active MoE. $0.48/Mtok tier (classified as โ‰ค16B due to active params). Serverless inference at $0.15/$1.50 per Mtok.
2. Qwen3-30B-A3B-Instruct-2507 โ€” 30B total, 3B active. Same price tier.
3. Gemma-3-4b-it โ€” Dense 4B. Cheapest training.
4. Qwen3-235B-A22B-Instruct-2507 โ€” The big one. $6/Mtok but highest quality.

---

Post-Training: Evaluation

Quick Test (20 prompts)

python
test_prompts = [
    "What projects are you currently working on?",
    "How would you approach building a new iOS app?",
    "Explain the N'Ko keyboard architecture",
    "What's the status of the BWB POS app?",
    "How does the Dream Garden work?",
    # ... add domain-specific prompts
]

### Twin Fidelity Score (TFS)
Compare Twin output vs ground truth on:
1. Factual accuracy โ€” Does it know Mo's projects?
2. Style match โ€” Does it sound like the training data?
3. Task completion โ€” Can it actually help with real tasks?
4. Hallucination rate โ€” How much does it make up?

Target: TFS โ‰ฅ 0.80 (baseline was 0.772 from V1)

---

Maintenance: Retraining Pipeline

### When to retrain
- Every 2-4 weeks (new conversation data accumulates)
- After major project changes
- When TFS drops below 0.75

### Steps
1. Run density scorer on new data: `python3 scripts/density_v4.py`
2. Run V9 generators for new domains: `python3 scripts/gen_v9_*.py`
3. Re-run merge: `python3 scripts/local_finetune.py`
4. Upload new data to Together AI / Colab
5. Launch new training job with `suffix: twin-alpha-v3` (increment)
6. Evaluate and deploy if TFS โ‰ฅ 0.80

---

File Inventory

cognitive-twin/
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ local_finetune.py          # Data merge + MLX prep
โ”‚   โ”œโ”€โ”€ density_v4.py              # Density scoring
โ”‚   โ””โ”€โ”€ gen_v9_*.py                # SFT data generators
โ”œโ”€โ”€ notebooks/
โ”‚   โ”œโ”€โ”€ twin_finetune_colab.ipynb  # Interactive Colab notebook
โ”‚   โ””โ”€โ”€ train_twin.py              # Standalone Colab script
โ”œโ”€โ”€ colab-upload/                   # Pre-packaged for Colab
โ”‚   โ”œโ”€โ”€ train.jsonl
โ”‚   โ”œโ”€โ”€ valid.jsonl
โ”‚   โ””โ”€โ”€ train_twin.py
โ”œโ”€โ”€ local_finetune/
โ”‚   โ”œโ”€โ”€ data/                      # Merged, deduplicated data
โ”‚   โ”œโ”€โ”€ adapters/                  # Mac4 local LoRA adapters
โ”‚   โ””โ”€โ”€ manifest.json
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ ctv3_export_v3/            # Base corpus (41K records)
โ”‚   โ””โ”€โ”€ expansion_v9/             # V9 generators output
โ”œโ”€โ”€ MAC4_BENCHMARK.md              # Local model benchmark results
โ””โ”€โ”€ TRAINING_RUNBOOK.md            # This file

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/packages/cognitive-twin/TRAINING_RUNBOOK.md

Detected Structure

Method ยท Evaluation ยท Code Anchors ยท Architecture