Grand Diomande Research · Full HTML Reader

Cognitive Twin V9 Dataset Audit

**Date:** 2026-02-18 **Previous version:** V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions) **Last training:** Never submitted (blocked on billing) **Goal:** Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation

Agents That Account for Themselves proposal experiment writeup candidate score 26 .md

Full Public Reader

Cognitive Twin V9 Dataset Audit

Date: 2026-02-18
Previous version: V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions)
Last training: Never submitted (blocked on billing)
Goal: Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation

---

Current Dataset Inventory

VersionRecordsSourceModel UsedDate
V5 (base)43,173Conversations, Apple Notes, Discord, WORMSVariousJan 2026
V6382Evoflow/TIE evolutionGemini 2.0 FlashFeb 2026
V7116Meta-evolution (methods, processes)Gemini 2.0 FlashFeb 2026
V8502Deep convos, session mining, RLM-enhancedGemini 3 Pro PreviewFeb 14
Combined77,708V5+V6+V7+V8 merged (SFT + DPO)Feb 14

Format: CTv3.1 JSONL — `{"messages": [...]}` for SFT, `{"input": {"messages": [...]}, "preferred_output": "...", "non_preferred_output": "..."}` for DPO

---

New Data Sources for V9 (Since Feb 14)

### Source 1: Architecture Specifications (32 CLAUDE.md files)
Location: `Desktop/*/CLAUDE.md`
Total: 32 files, ~5,600 lines
Key files (new/updated since V8):
- `clarity-agent-protocol/CLAUDE.md` (165 lines) — Smart contract governance for agents
- `SecuriClaw/CLAUDE.md` (99 lines) — Security benchmarking framework
- `PULSE-V1/CLAUDE.md` (46 lines) — Pulse protocol v1
- `compass/CLAUDE.md` (490 lines) — Daily planning app
- `AgentCommandCenter/CLAUDE.md` (91 lines) — Agent management UI
- `SecuriClaw-Claude/CLAUDE.md` (100 lines) — Claude-specific benchmarks
- `SecuriClaw-Codex/CLAUDE.md` (99 lines) — Codex benchmarks

Training value: HIGH — These define how we architect projects. Twin needs to replicate our design thinking.
Estimated yield: ~200-300 SFT pairs (architecture decisions, design patterns, tech stack choices)

### Source 2: Pulse Plans (23 enriched plans)
Location: `[home-path]`
Total: 23 plans covering all projects
Content: Problem statements, solution architectures, validation criteria, wave-gated task breakdowns

Training value: CRITICAL — This is how we decompose and execute work. The Twin must understand task planning, wave dependencies, and dispatch patterns.
Estimated yield: ~150-200 SFT pairs (task decomposition, prioritization, dependency reasoning)

### Source 3: Governance & Behavioral Protocols
Location: Core workspace files
| File | Lines | Content |
|------|-------|---------|
| AGENTS.md | 248 | Agent behavioral protocols, spawn rules, memory management |
| SOUL.md | 59 | Identity, personality, decision principles |
| HEARTBEAT.md | 220 | Autonomous check patterns, proactive behavior |
| USER.md | 85 | User context, communication style, preferences |
| PROTOCOLS.md | 231 | Task persistence, RTD verification, pre-flight checks |

Training value: CRITICAL — These ARE the Twin's personality. V5 had an older version; V9 needs the current evolved protocols.
Estimated yield: ~100-150 SFT pairs + 50-80 DPO pairs (correct autonomous behavior vs permission-seeking)

### Source 4: Skills Library (141 skills, 66,273 lines)
Location: `[home-path]`
Total: 141 skills covering every operational domain
Categories:
- Bot operations (pulse, plans, dream-weaver, comp-core, voice)
- Business (koatji, barista, insurance, sales)
- Creative (art, music, video, design)
- Language (N'Ko, French, ULL, linguistics)
- Technical (iOS deploy, expo, frontend, research)
- Cognitive (synthesis, evolution, evoflow, TIE)

Training value: HIGH — Skills define our operational vocabulary. Twin should know what skills exist and when to invoke them.
Estimated yield: ~300-500 SFT pairs (skill selection, skill invocation patterns, domain expertise)

### Source 5: Memory Files (35+ new since V8)
Location: `[home-path]`
Key files:
- `2026-02-14.md` through `2026-02-18.md` — Daily session logs
- `infra-architecture-v1.md` (240 lines) — Infrastructure decisions
- `mac4-architecture.md` (192 lines) — Mac4 standalone setup
- `visionclaw-data-layer-architecture.md` (243 lines) — VisionClaw data patterns
- `active-tasks.md` (216 lines) — Live task management
- `evocube-*.md` — Evolution session results
- Thread archives (homelab, spore, speak-flow, securiclaw, eternal-serenity, protocol-o)

Training value: HIGH — Raw decision-making patterns, architectural reasoning, project status tracking.
Estimated yield: ~200-300 SFT pairs (status reporting, decision narratives, context switching)

### Source 6: Kimi Memory Turns (30,704 messages)
Location: `[home-path]`
Density scoring: 9,155 turns scored (checkpoint at ID 249, latest file from Feb 16)
Remaining: ~21,500 unscored turns

Training value: MEDIUM-HIGH — Conversational patterns, but needs density filtering (CORE+ENRICHED only).
Estimated yield: ~500-1000 SFT pairs after density filtering (CORE 9-10 and ENRICHED 7-8 only)

### Source 7: RAG++ Embedded Corpus (76,687 documents)
Location: RAG++ service at :8000
Growth since V8: Significant (was ~60K at V5, now 76,687)

Training value: MEDIUM — Good for knowledge grounding but most is already captured in other sources.
Estimated yield: ~100-200 SFT pairs (knowledge retrieval patterns)

### Source 8: MiniMax M2.5 / Graph Kernel / Dream Weaver Patterns
New capabilities since V8:
- MiniMax M2.5 integration for local inference
- Graph Kernel with postgres + policy-based access
- Dream Weaver Evo³ multi-model evolution (Gemini + MiniMax + Kimi-K2)
- Cortex daemon for intelligent routing

Training value: HIGH — The Twin should understand our multi-model architecture.
Estimated yield: ~100-150 SFT pairs (model routing decisions, evolution patterns)

---

V9 Expansion Estimate

SourceSFT PairsDPO PairsPriority
Architecture specs (CLAUDE.md)25050🔴 Critical
Pulse Plans17540🔴 Critical
Governance protocols12580🔴 Critical
Skills library400100🟡 High
Memory files25060🟡 High
Kimi memory (density-filtered)750200🟡 High
Multi-model architecture12530🟢 Medium
TOTAL V9 EXPANSION~2,075~560

Combined V9 dataset estimate: 77,708 + 2,635 = ~80,343 records

---

Generation Strategy

### Phase 1: Extract & Structure (automated)
- Parse all CLAUDE.md files → structured project specs
- Parse all pulse plans → task decomposition examples
- Extract governance docs → behavioral guidelines
- Extract high-density Kimi turns (complete density scoring run)

### Phase 2: Generate Training Pairs (Gemini 3 Pro)
- Architecture Q&A: "How would you design X?" → detailed response using our patterns
- Task planning: "Break down this feature" → wave-gated pulse plan
- Behavioral: "Should I ask permission for X?" → correct autonomous response (DPO)
- Skill routing: "User needs X" → correct skill selection + invocation

### Phase 3: Quality Gate (RLM scoring)
- Run RLM anticipation controller on all generated pairs
- Reject anything below quality threshold
- Ensure no permission-seeking behavior in SFT data
- Validate format compliance (CTv3.1)

### Phase 4: Merge & Validate
- Merge V5+V6+V7+V8+V9 → final combined dataset
- Deduplicate by content hash
- Split train/val (95/5)
- Validate format for Together AI / Vast.ai

---

Model Consideration: V10 / Beyond

With MiniMax M2.5 now in the stack, we should consider:
1. Kimi-K2-Instruct (current default) — Strong reasoning, good at code
2. Qwen3-235B — Massive parameter count, excellent multilingual
3. MiniMax M2.5 — Already in our fleet, understands our patterns from Cortex
4. DeepSeek-R1 — Strong reasoning chain

The Evo³ session should evaluate which base model gets the most benefit from our LoRA.

---

Deployment Architecture Candidates (for Evo³)

ArchNameServingTraining TargetMonthly CostMac4 Role
ACloud TwinTogether AI serverless LoRAQwen3-235B$90-530Utility only
BHybrid Local/CloudMac4 Ollama + Together AIQwen3-235B + Llama 3.2-3B$50-200Local triage brain
CEdge-FirstMac4 Ollama primaryQwen2.5-7B or Llama 3.1-8B$10-50Primary brain
DFull ReversalMac4 GGUF MoEQwen3-235B Q4 GGUF~$0Everything
EContinuous TrainingMac4 trains + servesLlama 3.2-3B (iterative)~$0Brain + trainer

Recommendation: Arch B (Hybrid) — Mac4 fine-tuned 3B for triage/routing, Qwen3-235B on Together AI for heavy reasoning. Evo³ should stress-test all five.

---

## Blockers
- ~~V8 stages 3/4 retrying~~ (resolved — 502 records generated)
- Billing limit on Together AI (was the original V8 blocker)
- Vast.ai instance not yet rented ($14-74 for A100 80GB)
- Density scoring needs to complete (21,500 unscored Kimi turns)

---

Ready for Evo³ session to define training strategy, then Pulse Plan for execution.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/packages/cognitive-twin/V9_AUDIT.md

Detected Structure

Method · Evaluation · References · Architecture