Grand Diomande Research · Full HTML Reader

Cognitive Twin V9 Dataset Audit

**Date:** 2026-02-18 **Previous version:** V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions) **Last training:** Never submitted (blocked on billing) **Goal:** Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation

Agents That Account for Themselves proposal experiment writeup candidate score 26 .md

Full Public Reader

Cognitive Twin V9 Dataset Audit

Date: 2026-02-18
Previous version: V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions)
Last training: Never submitted (blocked on billing)
Goal: Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation

---

Current Dataset Inventory

Version	Records	Source	Model Used	Date
V5 (base)	43,173	Conversations, Apple Notes, Discord, WORMS	Various	Jan 2026
V6	382	Evoflow/TIE evolution	Gemini 2.0 Flash	Feb 2026
V7	116	Meta-evolution (methods, processes)	Gemini 2.0 Flash	Feb 2026
V8	502	Deep convos, session mining, RLM-enhanced	Gemini 3 Pro Preview	Feb 14
Combined	77,708	V5+V6+V7+V8 merged (SFT + DPO)	—	Feb 14

Format: CTv3.1 JSONL — `{"messages": [...]}` for SFT, `{"input": {"messages": [...]}, "preferred_output": "...", "non_preferred_output": "..."}` for DPO

---

New Data Sources for V9 (Since Feb 14)

### Source 1: Architecture Specifications (32 CLAUDE.md files)
Location: `Desktop/*/CLAUDE.md`
Total: 32 files, ~5,600 lines
Key files (new/updated since V8):
- `clarity-agent-protocol/CLAUDE.md` (165 lines) — Smart contract governance for agents
- `SecuriClaw/CLAUDE.md` (99 lines) — Security benchmarking framework
- `PULSE-V1/CLAUDE.md` (46 lines) — Pulse protocol v1
- `compass/CLAUDE.md` (490 lines) — Daily planning app
- `AgentCommandCenter/CLAUDE.md` (91 lines) — Agent management UI
- `SecuriClaw-Claude/CLAUDE.md` (100 lines) — Claude-specific benchmarks
- `SecuriClaw-Codex/CLAUDE.md` (99 lines) — Codex benchmarks

Training value: HIGH — These define how we architect projects. Twin needs to replicate our design thinking.
Estimated yield: ~200-300 SFT pairs (architecture decisions, design patterns, tech stack choices)

### Source 2: Pulse Plans (23 enriched plans)
Location: `[home-path]`
Total: 23 plans covering all projects
Content: Problem statements, solution architectures, validation criteria, wave-gated task breakdowns

Training value: CRITICAL — This is how we decompose and execute work. The Twin must understand task planning, wave dependencies, and dispatch patterns.
Estimated yield: ~150-200 SFT pairs (task decomposition, prioritization, dependency reasoning)

### Source 3: Governance & Behavioral Protocols
Location: Core workspace files
| File | Lines | Content |
|------|-------|---------|
| AGENTS.md | 248 | Agent behavioral protocols, spawn rules, memory management |
| SOUL.md | 59 | Identity, personality, decision principles |
| HEARTBEAT.md | 220 | Autonomous check patterns, proactive behavior |
| USER.md | 85 | User context, communication style, preferences |
| PROTOCOLS.md | 231 | Task persistence, RTD verification, pre-flight checks |

Training value: CRITICAL — These ARE the Twin's personality. V5 had an older version; V9 needs the current evolved protocols.
Estimated yield: ~100-150 SFT pairs + 50-80 DPO pairs (correct autonomous behavior vs permission-seeking)

### Source 4: Skills Library (141 skills, 66,273 lines)
Location: `[home-path]`
Total: 141 skills covering every operational domain
Categories:
- Bot operations (pulse, plans, dream-weaver, comp-core, voice)
- Business (koatji, barista, insurance, sales)
- Creative (art, music, video, design)
- Language (N'Ko, French, ULL, linguistics)
- Technical (iOS deploy, expo, frontend, research)
- Cognitive (synthesis, evolution, evoflow, TIE)

Training value: HIGH — Skills define our operational vocabulary. Twin should know what skills exist and when to invoke them.
Estimated yield: ~300-500 SFT pairs (skill selection, skill invocation patterns, domain expertise)

### Source 5: Memory Files (35+ new since V8)
Location: `[home-path]`
Key files:
- `2026-02-14.md` through `2026-02-18.md` — Daily session logs
- `infra-architecture-v1.md` (240 lines) — Infrastructure decisions
- `mac4-architecture.md` (192 lines) — Mac4 standalone setup
- `visionclaw-data-layer-architecture.md` (243 lines) — VisionClaw data patterns
- `active-tasks.md` (216 lines) — Live task management
- `evocube-*.md` — Evolution session results
- Thread archives (homelab, spore, speak-flow, securiclaw, eternal-serenity, protocol-o)

Training value: HIGH — Raw decision-making patterns, architectural reasoning, project status tracking.
Estimated yield: ~200-300 SFT pairs (status reporting, decision narratives, context switching)

### Source 6: Kimi Memory Turns (30,704 messages)
Location: `[home-path]`
Density scoring: 9,155 turns scored (checkpoint at ID 249, latest file from Feb 16)
Remaining: ~21,500 unscored turns

Training value: MEDIUM-HIGH — Conversational patterns, but needs density filtering (CORE+ENRICHED only).
Estimated yield: ~500-1000 SFT pairs after density filtering (CORE 9-10 and ENRICHED 7-8 only)

### Source 7: RAG++ Embedded Corpus (76,687 documents)
Location: RAG++ service at :8000
Growth since V8: Significant (was ~60K at V5, now 76,687)

Training value: MEDIUM — Good for knowledge grounding but most is already captured in other sources.
Estimated yield: ~100-200 SFT pairs (knowledge retrieval patterns)

### Source 8: MiniMax M2.5 / Graph Kernel / Dream Weaver Patterns
New capabilities since V8:
- MiniMax M2.5 integration for local inference
- Graph Kernel with postgres + policy-based access
- Dream Weaver Evo³ multi-model evolution (Gemini + MiniMax + Kimi-K2)
- Cortex daemon for intelligent routing

Training value: HIGH — The Twin should understand our multi-model architecture.
Estimated yield: ~100-150 SFT pairs (model routing decisions, evolution patterns)

---

V9 Expansion Estimate

Source	SFT Pairs	DPO Pairs	Priority
Architecture specs (CLAUDE.md)	250	50	🔴 Critical
Pulse Plans	175	40	🔴 Critical
Governance protocols	125	80	🔴 Critical
Skills library	400	100	🟡 High
Memory files	250	60	🟡 High
Kimi memory (density-filtered)	750	200	🟡 High
Multi-model architecture	125	30	🟢 Medium
TOTAL V9 EXPANSION	~2,075	~560	—

Combined V9 dataset estimate: 77,708 + 2,635 = ~80,343 records

---

Generation Strategy

### Phase 1: Extract & Structure (automated)
- Parse all CLAUDE.md files → structured project specs
- Parse all pulse plans → task decomposition examples
- Extract governance docs → behavioral guidelines
- Extract high-density Kimi turns (complete density scoring run)

### Phase 2: Generate Training Pairs (Gemini 3 Pro)
- Architecture Q&A: "How would you design X?" → detailed response using our patterns
- Task planning: "Break down this feature" → wave-gated pulse plan
- Behavioral: "Should I ask permission for X?" → correct autonomous response (DPO)
- Skill routing: "User needs X" → correct skill selection + invocation

### Phase 3: Quality Gate (RLM scoring)
- Run RLM anticipation controller on all generated pairs
- Reject anything below quality threshold
- Ensure no permission-seeking behavior in SFT data
- Validate format compliance (CTv3.1)

### Phase 4: Merge & Validate
- Merge V5+V6+V7+V8+V9 → final combined dataset
- Deduplicate by content hash
- Split train/val (95/5)
- Validate format for Together AI / Vast.ai

---

Model Consideration: V10 / Beyond

With MiniMax M2.5 now in the stack, we should consider:
1. Kimi-K2-Instruct (current default) — Strong reasoning, good at code
2. Qwen3-235B — Massive parameter count, excellent multilingual
3. MiniMax M2.5 — Already in our fleet, understands our patterns from Cortex
4. DeepSeek-R1 — Strong reasoning chain

The Evo³ session should evaluate which base model gets the most benefit from our LoRA.

---

Deployment Architecture Candidates (for Evo³)

Arch	Name	Serving	Training Target	Monthly Cost	Mac4 Role
A	Cloud Twin	Together AI serverless LoRA	Qwen3-235B	$90-530	Utility only
B	Hybrid Local/Cloud	Mac4 Ollama + Together AI	Qwen3-235B + Llama 3.2-3B	$50-200	Local triage brain
C	Edge-First	Mac4 Ollama primary	Qwen2.5-7B or Llama 3.1-8B	$10-50	Primary brain
D	Full Reversal	Mac4 GGUF MoE	Qwen3-235B Q4 GGUF	~$0	Everything
E	Continuous Training	Mac4 trains + serves	Llama 3.2-3B (iterative)	~$0	Brain + trainer

Recommendation: Arch B (Hybrid) — Mac4 fine-tuned 3B for triage/routing, Qwen3-235B on Together AI for heavy reasoning. Evo³ should stress-test all five.

---

## Blockers
- ~~V8 stages 3/4 retrying~~ (resolved — 502 records generated)
- Billing limit on Together AI (was the original V8 blocker)
- Vast.ai instance not yet rented ($14-74 for A100 80GB)
- Density scoring needs to complete (21,500 unscored Kimi turns)

---

Ready for Evo³ session to define training strategy, then Pulse Plan for execution.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/packages/cognitive-twin/V9_AUDIT.md

Detected Structure

Method · Evaluation · References · Architecture