Back to corpus
proposalexperiment writeup candidatescore 26

Cognitive Twin V9 Dataset Audit

**Date:** 2026-02-18 **Previous version:** V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions) **Last training:** Never submitted (blocked on billing) **Goal:** Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

**Date:** 2026-02-18 **Previous version:** V8 (combined: 77,708 records — 43,173 V5 base + V6/V7/V8 expansions) **Last training:** Never submitted (blocked on billing) **Goal:** Catalog all new data sources since V8 (Feb 14), estimate record yield, prepare V9 expansion generation | Version | Records | Source | Model Used | Date | |---------|---------|--------|-----------|------| | V5 (base) | 43,173 | Conversations, Apple Notes, Discord, WORMS | Various | Jan 2026 | | V6 | 382 | Evoflow/TIE evolution | Gemini 2.0 Flash | Feb 2026 | | V7 | 116 | Meta-evolution (methods, processes) | Gemini 2.0 Flash | Feb 2026 | | V8 | 502 | Deep convos, session mining, RLM-enhanced | Gemini 3 Pro Preview | Feb 14 | | **Combined** | **77,708** | V5+V6+V7+V8 merged (SFT + DPO) | — | Feb 14 | **Format:** CTv3.1 JSONL — `{"messages": [...]}` for SFT, `{"input": {"messages": [...]}, "preferred_output": "...", "non_preferred_output": "..."}` for DPO ### Source 1: Architecture Specifications (32 CLAUDE.md files) **Location:** `Desktop/*/CLAUDE.md` **Total:** 32 files, ~5,600 lines **Key files (new/updated since V8):** - `clarity-agent-protocol/CLAUDE.md` (165 lines) — Smart contract governance for agents - `SecuriClaw/CLAUDE.md` (99 lines) — Security benchmarking framework - `PULSE-V1/CLAUDE.md` (46 lines) — Pulse protocol v1 - `compass/CLAUDE.md` (490 lines) — Daily planning app - `AgentCommandCenter/CLAUDE.md` (91 lines) — Agent management UI - `SecuriClaw-Claude/CLAUDE.md` (100 lines) — Claude-specific benchmarks - `SecuriClaw-Codex/CLAUDE.md` (99 lines) — Codex benchmarks **Training value:** HIGH — These define how we architect projects. Twin needs to replicate our design thinking. **Estimated yield:** ~200-300 SFT pairs (architecture decisions, design patterns, tech stack choices)

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.