Cognitive Twin V9 Dataset Audit

Full HTML reader

Read the full artifact

Extracted abstract or opening context

**Date:** 2026-02-18 **Previous version:** V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions) **Last training:** Never submitted (blocked on billing) **Goal:** Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation | Version | Records | Source | Model Used | Date | |---------|---------|--------|-----------|------| | V5 (base) | 43,173 | Conversations, Apple Notes, Discord, WORMS | Various | Jan 2026 | | V6 | 382 | Evoflow/TIE evolution | Gemini 2.0 Flash | Feb 2026 | | V7 | 116 | Meta-evolution (methods, processes) | Gemini 2.0 Flash | Feb 2026 | | V8 | 502 | Deep convos, session mining, RLM-enhanced | Gemini 3 Pro Preview | Feb 14 | | **Combined** | **77,708** | V5+V6+V7+V8 merged (SFT + DPO) | — | Feb 14 | **Format:** CTv3.1 JSONL — `{"messages": [...]}` for SFT, `{"input": {"messages": [...]}, "preferred_output": "...", "non_preferred_output": "..."}` for DPO ### Source 1: Architecture Specifications (32 CLAUDE.md files) **Location:** `Desktop/*/CLAUDE.md` **Total:** 32 files, ~5,600 lines **Key files (new/updated since V8):** - `clarity-agent-protocol/CLAUDE.md` (165 lines) — Smart contract governance for agents - `SecuriClaw/CLAUDE.md` (99 lines) — Security benchmarking framework - `PULSE-V1/CLAUDE.md` (46 lines) — Pulse protocol v1 - `compass/CLAUDE.md` (490 lines) — Daily planning app - `AgentCommandCenter/CLAUDE.md` (91 lines) — Agent management UI - `SecuriClaw-Claude/CLAUDE.md` (100 lines) — Claude-specific benchmarks - `SecuriClaw-Codex/CLAUDE.md` (99 lines) — Codex benchmarks **Training value:** HIGH — These define how we architect projects. Twin needs to replicate our design thinking. **Estimated yield:** ~200-300 SFT pairs (architecture decisions, design patterns, tech stack choices)

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.