Grand Diomande Research · Full HTML Reader

T1 Complete — Auto-Extract Knowledge Graph

**Status:** ✅ COMPLETE **Completed:** 2026-02-19T18:47 EST **Duration:** ~5 minutes extraction time (196K messages × 150+ regex patterns)

Agents That Account for Themselves proposal experiment writeup candidate score 24 .md

Full Public Reader

T1 Complete — Auto-Extract Knowledge Graph

Status: ✅ COMPLETE
Completed: 2026-02-19T18:47 EST
Duration: ~5 minutes extraction time (196K messages × 150+ regex patterns)

Results

MetricBefore (v1)After (v2)Growth
Nodes702703.9×
Edges756899.2×
Node types913+4 new types
Relationship types~1530+

Node Type Breakdown

TypeCount
technology121
project62
concept28
decision14
service12
channel10
machine6
location6
app4
person3
storefront2
agent1
architecture1

Top Relationship Types

RelationshipCount
uses273
co_used_with102
created63
related_to48
relates_to21
part_of15
decided14
built_with12
evolved_from12
contains11

Data Sources

1. Existing graph (`data/expanded_graph.json`) — 70 seed nodes, ground truth
2. Ledger DB (`[home-path]`) — 196,917 messages across 5,934 sessions
3. Knowledge Graph DB (`[home-path]`) — 7,595 entities cross-referenced (37 high-quality ones added)
4. Curated dictionaries — 150+ regex patterns for technologies, projects, concepts, locations, machines

Extraction Strategy

  • Pattern matching: 150+ curated regex patterns for known entities (technologies, projects, concepts, locations)
  • Co-occurrence analysis: Session-level entity co-occurrence used to build `uses`, `co_used_with`, `related_to` edges
  • Cross-reference: knowledge_graph.db entities with 5+ mentions and clean names merged in
  • Hardcoded domain knowledge: 100+ manually curated relationship edges for known architecture
  • Deduplication: Merged `compcore`→`comp_core`, `serenity_soother`→`serenity`
  • Orphan recovery: Technologies with 5+ mentions connected to Mo's ecosystem instead of being pruned

Validation

  • ✅ No duplicate node names
  • ✅ All edge endpoints exist in node set
  • ✅ Valid JSON (131 KB)
  • ✅ 270 nodes > 200 target
  • ✅ Adjacency list generated for all 270 nodes

Output Files

  • Graph: `data/expanded_graph_v2.json` (131 KB)
  • Script: `scripts/extract_knowledge_graph.py` (reproducible)

New Entity Categories (not in v1)

  • Discord channels (10): #quick, #bridge, #milkmen-delivery, etc.
  • Locations (6): NYC, SoHo, Brooklyn, Manhattan, West Africa, Guinea
  • HEF evolutions (9): Failure Museum, Serendipity Engine, Thought Mesh, etc.
  • Clawdbot subsystems (10): Context Ledger, Cortex, Vault, Agent Recovery, etc.
  • ML concepts (15+): RAG, RLM, embeddings, fine-tuning, zero-shot, etc.
  • Decisions (14): Extracted from ledger decisions table
  • Additional technologies (60+): TypeScript, Docker, FastAPI, PyTorch, Gemini, etc.

Note on kimi_memory.db

The original task referenced `[home-path]` but that file is 0 bytes (empty). The actual conversation data lives in `[home-path]` (1.1GB, 196K messages). The script uses this as the primary data source.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/packages/cognitive-twin/T1-COMPLETE.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture