Cognitive Twin V9 Dataset Audit
**Date:** 2026-02-18 **Previous version:** V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions) **Last training:** Never submitted (blocked on billing) **Goal:** Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation
Full Public Reader
Cognitive Twin V9 Dataset Audit
Date: 2026-02-18
Previous version: V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions)
Last training: Never submitted (blocked on billing)
Goal: Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation
---
Current Dataset Inventory
| Version | Records | Source | Model Used | Date |
|---|---|---|---|---|
| V5 (base) | 43,173 | Conversations, Apple Notes, Discord, WORMS | Various | Jan 2026 |
| V6 | 382 | Evoflow/TIE evolution | Gemini 2.0 Flash | Feb 2026 |
| V7 | 116 | Meta-evolution (methods, processes) | Gemini 2.0 Flash | Feb 2026 |
| V8 | 502 | Deep convos, session mining, RLM-enhanced | Gemini 3 Pro Preview | Feb 14 |
| Combined | 77,708 | V5+V6+V7+V8 merged (SFT + DPO) | — | Feb 14 |
Format: CTv3.1 JSONL — `{"messages": [...]}` for SFT, `{"input": {"messages": [...]}, "preferred_output": "...", "non_preferred_output": "..."}` for DPO
---
New Data Sources for V9 (Since Feb 14)
### Source 1: Architecture Specifications (32 CLAUDE.md files)
Location: `Desktop/*/CLAUDE.md`
Total: 32 files, ~5,600 lines
Key files (new/updated since V8):
- `clarity-agent-protocol/CLAUDE.md` (165 lines) — Smart contract governance for agents
- `SecuriClaw/CLAUDE.md` (99 lines) — Security benchmarking framework
- `PULSE-V1/CLAUDE.md` (46 lines) — Pulse protocol v1
- `compass/CLAUDE.md` (490 lines) — Daily planning app
- `AgentCommandCenter/CLAUDE.md` (91 lines) — Agent management UI
- `SecuriClaw-Claude/CLAUDE.md` (100 lines) — Claude-specific benchmarks
- `SecuriClaw-Codex/CLAUDE.md` (99 lines) — Codex benchmarks
Training value: HIGH — These define how we architect projects. Twin needs to replicate our design thinking.
Estimated yield: ~200-300 SFT pairs (architecture decisions, design patterns, tech stack choices)
### Source 2: Pulse Plans (23 enriched plans)
Location: `[home-path]`
Total: 23 plans covering all projects
Content: Problem statements, solution architectures, validation criteria, wave-gated task breakdowns
Training value: CRITICAL — This is how we decompose and execute work. The Twin must understand task planning, wave dependencies, and dispatch patterns.
Estimated yield: ~150-200 SFT pairs (task decomposition, prioritization, dependency reasoning)
### Source 3: Governance & Behavioral Protocols
Location: Core workspace files
| File | Lines | Content |
|------|-------|---------|
| AGENTS.md | 248 | Agent behavioral protocols, spawn rules, memory management |
| SOUL.md | 59 | Identity, personality, decision principles |
| HEARTBEAT.md | 220 | Autonomous check patterns, proactive behavior |
| USER.md | 85 | User context, communication style, preferences |
| PROTOCOLS.md | 231 | Task persistence, RTD verification, pre-flight checks |
Training value: CRITICAL — These ARE the Twin's personality. V5 had an older version; V9 needs the current evolved protocols.
Estimated yield: ~100-150 SFT pairs + 50-80 DPO pairs (correct autonomous behavior vs permission-seeking)
### Source 4: Skills Library (141 skills, 66,273 lines)
Location: `[home-path]`
Total: 141 skills covering every operational domain
Categories:
- Bot operations (pulse, plans, dream-weaver, comp-core, voice)
- Business (koatji, barista, insurance, sales)
- Creative (art, music, video, design)
- Language (N'Ko, French, ULL, linguistics)
- Technical (iOS deploy, expo, frontend, research)
- Cognitive (synthesis, evolution, evoflow, TIE)
Training value: HIGH — Skills define our operational vocabulary. Twin should know what skills exist and when to invoke them.
Estimated yield: ~300-500 SFT pairs (skill selection, skill invocation patterns, domain expertise)
### Source 5: Memory Files (35+ new since V8)
Location: `[home-path]`
Key files:
- `2026-02-14.md` through `2026-02-18.md` — Daily session logs
- `infra-architecture-v1.md` (240 lines) — Infrastructure decisions
- `mac4-architecture.md` (192 lines) — Mac4 standalone setup
- `visionclaw-data-layer-architecture.md` (243 lines) — VisionClaw data patterns
- `active-tasks.md` (216 lines) — Live task management
- `evocube-*.md` — Evolution session results
- Thread archives (homelab, spore, speak-flow, securiclaw, eternal-serenity, protocol-o)
Training value: HIGH — Raw decision-making patterns, architectural reasoning, project status tracking.
Estimated yield: ~200-300 SFT pairs (status reporting, decision narratives, context switching)
### Source 6: Kimi Memory Turns (30,704 messages)
Location: `[home-path]`
Density scoring: 9,155 turns scored (checkpoint at ID 249, latest file from Feb 16)
Remaining: ~21,500 unscored turns
Training value: MEDIUM-HIGH — Conversational patterns, but needs density filtering (CORE+ENRICHED only).
Estimated yield: ~500-1000 SFT pairs after density filtering (CORE 9-10 and ENRICHED 7-8 only)
### Source 7: RAG++ Embedded Corpus (76,687 documents)
Location: RAG++ service at :8000
Growth since V8: Significant (was ~60K at V5, now 76,687)
Training value: MEDIUM — Good for knowledge grounding but most is already captured in other sources.
Estimated yield: ~100-200 SFT pairs (knowledge retrieval patterns)
### Source 8: MiniMax M2.5 / Graph Kernel / Dream Weaver Patterns
New capabilities since V8:
- MiniMax M2.5 integration for local inference
- Graph Kernel with postgres + policy-based access
- Dream Weaver Evo³ multi-model evolution (Gemini + MiniMax + Kimi-K2)
- Cortex daemon for intelligent routing
Training value: HIGH — The Twin should understand our multi-model architecture.
Estimated yield: ~100-150 SFT pairs (model routing decisions, evolution patterns)
---
V9 Expansion Estimate
| Source | SFT Pairs | DPO Pairs | Priority |
|---|---|---|---|
| Architecture specs (CLAUDE.md) | 250 | 50 | 🔴 Critical |
| Pulse Plans | 175 | 40 | 🔴 Critical |
| Governance protocols | 125 | 80 | 🔴 Critical |
| Skills library | 400 | 100 | 🟡 High |
| Memory files | 250 | 60 | 🟡 High |
| Kimi memory (density-filtered) | 750 | 200 | 🟡 High |
| Multi-model architecture | 125 | 30 | 🟢 Medium |
| TOTAL V9 EXPANSION | ~2,075 | ~560 | — |
Combined V9 dataset estimate: 77,708 + 2,635 = ~80,343 records
---
Generation Strategy
### Phase 1: Extract & Structure (automated)
- Parse all CLAUDE.md files → structured project specs
- Parse all pulse plans → task decomposition examples
- Extract governance docs → behavioral guidelines
- Extract high-density Kimi turns (complete density scoring run)
### Phase 2: Generate Training Pairs (Gemini 3 Pro)
- Architecture Q&A: "How would you design X?" → detailed response using our patterns
- Task planning: "Break down this feature" → wave-gated pulse plan
- Behavioral: "Should I ask permission for X?" → correct autonomous response (DPO)
- Skill routing: "User needs X" → correct skill selection + invocation
### Phase 3: Quality Gate (RLM scoring)
- Run RLM anticipation controller on all generated pairs
- Reject anything below quality threshold
- Ensure no permission-seeking behavior in SFT data
- Validate format compliance (CTv3.1)
### Phase 4: Merge & Validate
- Merge V5+V6+V7+V8+V9 → final combined dataset
- Deduplicate by content hash
- Split train/val (95/5)
- Validate format for Together AI / Vast.ai
---
Model Consideration: V10 / Beyond
With MiniMax M2.5 now in the stack, we should consider:
1. Kimi-K2-Instruct (current default) — Strong reasoning, good at code
2. Qwen3-235B — Massive parameter count, excellent multilingual
3. MiniMax M2.5 — Already in our fleet, understands our patterns from Cortex
4. DeepSeek-R1 — Strong reasoning chain
The Evo³ session should evaluate which base model gets the most benefit from our LoRA.
---
Deployment Architecture Candidates (for Evo³)
| Arch | Name | Serving | Training Target | Monthly Cost | Mac4 Role |
|---|---|---|---|---|---|
| A | Cloud Twin | Together AI serverless LoRA | Qwen3-235B | $90-530 | Utility only |
| B | Hybrid Local/Cloud | Mac4 Ollama + Together AI | Qwen3-235B + Llama 3.2-3B | $50-200 | Local triage brain |
| C | Edge-First | Mac4 Ollama primary | Qwen2.5-7B or Llama 3.1-8B | $10-50 | Primary brain |
| D | Full Reversal | Mac4 GGUF MoE | Qwen3-235B Q4 GGUF | ~$0 | Everything |
| E | Continuous Training | Mac4 trains + serves | Llama 3.2-3B (iterative) | ~$0 | Brain + trainer |
Recommendation: Arch B (Hybrid) — Mac4 fine-tuned 3B for triage/routing, Qwen3-235B on Together AI for heavy reasoning. Evo³ should stress-test all five.
---
## Blockers
- ~~V8 stages 3/4 retrying~~ (resolved — 502 records generated)
- Billing limit on Together AI (was the original V8 blocker)
- Vast.ai instance not yet rented ($14-74 for A100 80GB)
- Density scoring needs to complete (21,500 unscored Kimi turns)
---
Ready for Evo³ session to define training strategy, then Pulse Plan for execution.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/packages/cognitive-twin/V9_AUDIT.md
Detected Structure
Method · Evaluation · References · Architecture