T1 Complete — Auto-Extract Knowledge Graph
**Status:** ✅ COMPLETE **Completed:** 2026-02-19T18:47 EST **Duration:** ~5 minutes extraction time (196K messages × 150+ regex patterns)
Full Public Reader
T1 Complete — Auto-Extract Knowledge Graph
Status: ✅ COMPLETE
Completed: 2026-02-19T18:47 EST
Duration: ~5 minutes extraction time (196K messages × 150+ regex patterns)
Results
| Metric | Before (v1) | After (v2) | Growth |
|---|---|---|---|
| Nodes | 70 | 270 | 3.9× |
| Edges | 75 | 689 | 9.2× |
| Node types | 9 | 13 | +4 new types |
| Relationship types | ~15 | 30+ | 2× |
Node Type Breakdown
| Type | Count |
|---|---|
| technology | 121 |
| project | 62 |
| concept | 28 |
| decision | 14 |
| service | 12 |
| channel | 10 |
| machine | 6 |
| location | 6 |
| app | 4 |
| person | 3 |
| storefront | 2 |
| agent | 1 |
| architecture | 1 |
Top Relationship Types
| Relationship | Count |
|---|---|
| uses | 273 |
| co_used_with | 102 |
| created | 63 |
| related_to | 48 |
| relates_to | 21 |
| part_of | 15 |
| decided | 14 |
| built_with | 12 |
| evolved_from | 12 |
| contains | 11 |
Data Sources
1. Existing graph (`data/expanded_graph.json`) — 70 seed nodes, ground truth
2. Ledger DB (`[home-path]`) — 196,917 messages across 5,934 sessions
3. Knowledge Graph DB (`[home-path]`) — 7,595 entities cross-referenced (37 high-quality ones added)
4. Curated dictionaries — 150+ regex patterns for technologies, projects, concepts, locations, machines
Extraction Strategy
- Pattern matching: 150+ curated regex patterns for known entities (technologies, projects, concepts, locations)
- Co-occurrence analysis: Session-level entity co-occurrence used to build `uses`, `co_used_with`, `related_to` edges
- Cross-reference: knowledge_graph.db entities with 5+ mentions and clean names merged in
- Hardcoded domain knowledge: 100+ manually curated relationship edges for known architecture
- Deduplication: Merged `compcore`→`comp_core`, `serenity_soother`→`serenity`
- Orphan recovery: Technologies with 5+ mentions connected to Mo's ecosystem instead of being pruned
Validation
- ✅ No duplicate node names
- ✅ All edge endpoints exist in node set
- ✅ Valid JSON (131 KB)
- ✅ 270 nodes > 200 target
- ✅ Adjacency list generated for all 270 nodes
Output Files
- Graph: `data/expanded_graph_v2.json` (131 KB)
- Script: `scripts/extract_knowledge_graph.py` (reproducible)
New Entity Categories (not in v1)
- Discord channels (10): #quick, #bridge, #milkmen-delivery, etc.
- Locations (6): NYC, SoHo, Brooklyn, Manhattan, West Africa, Guinea
- HEF evolutions (9): Failure Museum, Serendipity Engine, Thought Mesh, etc.
- Clawdbot subsystems (10): Context Ledger, Cortex, Vault, Agent Recovery, etc.
- ML concepts (15+): RAG, RLM, embeddings, fine-tuning, zero-shot, etc.
- Decisions (14): Extracted from ledger decisions table
- Additional technologies (60+): TypeScript, Docker, FastAPI, PyTorch, Gemini, etc.
Note on kimi_memory.db
The original task referenced `[home-path]` but that file is 0 bytes (empty). The actual conversation data lives in `[home-path]` (1.1GB, 196K messages). The script uses this as the primary data source.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/packages/cognitive-twin/T1-COMPLETE.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture