Grand Diomande Research · Full HTML Reader

Graph Kernel Comprehensive Evaluation Report

**OpenClaw CompCore — Technical Evaluation** **Version:** 1.0.0 · **Date:** 2026-02-13 **Authors:** Mohamed Diomande, OpenClaw Research **Classification:** Internal Technical Report

Agents That Account for Themselves experiment experiment writeup candidate score 46 .md

Full Public Reader

Graph Kernel Comprehensive Evaluation Report

OpenClaw CompCore — Technical Evaluation
Version: 1.0.0 · Date: 2026-02-13
Authors: Mohamed Diomande, OpenClaw Research
Classification: Internal Technical Report

---

Executive Summary

The OpenClaw Graph Kernel (GK) is a deterministic context slicing engine implemented as a single Rust binary (Axum/Tokio) that serves a dual purpose: (1) constructing reproducible, policy-governed, HMAC-signed context windows for autonomous AI agents, and (2) operating as a lightweight knowledge graph triple store over a PostgreSQL backend.

We evaluated the Graph Kernel against three baseline retrieval methods (keyword search, BM25, and RAG++ vector similarity) across 27 queries spanning five categories. Additionally, we performed an extensive comparative analysis against nine industry-grade graph databases, knowledge graph frameworks, and RAG orchestrators: Neo4j, Amazon Neptune, Apache Jena/Fuseki, Dgraph, TypeDB, Weaviate, LangChain/LlamaIndex Knowledge Graphs, Microsoft GraphRAG, and Zep.

Key Findings

1. Context Slicing is Irreplaceable. No evaluated alternative provides deterministic, HMAC-signed, policy-governed context window construction. This is the Graph Kernel's unique value proposition and cannot be replicated by bolting features onto general-purpose graph databases.

2. Multi-hop Reasoning Achieves Perfect Relevance. The GK achieves 1.00 relevance on multi-hop traversal queries, returning structurally connected knowledge chains rather than keyword-coincidence result sets. This is qualitatively distinct from high relevance scores achieved by text-matching baselines.

3. Latency is Network-Dominated, Not Compute-Bound. At 291.7ms average response time, 90

4. Semantic Search is the Primary Gap. With 0.42 average relevance on fuzzy/semantic queries, the GK lacks embedding-based similarity. The planned RAG++ integration bridge addresses this by combining structural reasoning with vector similarity.

5. Entity Normalization Fragments Knowledge. 169 raw subjects collapse to 132 canonical entities with 123 identified duplicates (e.g., "Dream Weaver" ≠ "dream-weaver-engine"). This normalization gap suppresses relationship query relevance from a theoretical 1.00 to the measured 0.94.

Verdict: The Graph Kernel justifies its operational complexity as the provenance and context authority layer in the CompCore stack. It is not a general-purpose search engine and should not be evaluated as one.

---

1. Architecture Deep Dive

1.1 System Design

The Graph Kernel is a Rust binary (~15 KLOC) built on the Axum web framework with Tokio async runtime. It compiles to a single statically-linked binary that can be deployed as a local service, Docker container, or Google Cloud Run instance.

┌──────────────────────────────────────────────────────────────────┐
│                    GRAPH KERNEL SERVICE                           │
│                                                                  │
│  ┌─────────────┐  ┌─────────────────┐  ┌──────────────────────┐ │
│  │  API Layer   │  │   Core Engine   │  │   Storage Layer      │ │
│  │  (Axum)      │  │                 │  │                      │ │
│  │              │  │  ContextSlicer  │  │  PostgresGraphStore  │ │
│  │  /api/slice  │→│  PolicyRegistry │→│  (sqlx, pool=2..10) │ │
│  │  /api/verify │  │  TokenAuthority │  │                      │ │
│  │  /api/knowledge│ │  SnapshotHash  │  │  InMemoryGraphStore  │ │
│  │  /health/*   │  │                 │  │  (testing only)      │ │
│  └─────────────┘  └─────────────────┘  └──────────────────────┘ │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────────┐│
│  │  Observability: Structured JSON logs, Cloud Trace, CORS      ││
│  └──────────────────────────────────────────────────────────────┘│
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
                  ┌─────────────────────┐
                  │  PostgreSQL          │
                  │  (Supabase, remote) │
                  │  knowledge_graph     │
                  │  memory_turns        │
                  │  conversations       │
                  │  edges               │
                  └─────────────────────┘

1.2 Dual-Purpose Design

The Graph Kernel serves two distinct functions:

Primary: Deterministic Context Slicing
- BFS expansion from an anchor turn through the conversation DAG
- Phase-weighted priority scoring (Synthesis > Planning > Consolidation > Debugging > Exploration)
- Budget-bounded slice construction (max_nodes, max_radius)
- HMAC-SHA256 signed admissibility tokens for downstream trust verification
- xxHash64-based slice fingerprinting for reproducibility proofs

Secondary: Knowledge Graph Triple Store
- Subject–Predicate–Object triples with confidence scores and source provenance
- REST API for CRUD operations on knowledge triples
- Batch ingestion endpoint for pipeline-driven knowledge extraction
- Statistics and query endpoints for graph exploration

1.3 Security Model: HMAC-Signed Admissibility

The Graph Kernel implements a cryptographic trust boundary using HMAC-SHA256:

[sensitive field redacted], canonical_string)[0..16]

canonical_string = "{slice_id}|{anchor_turn_id}|{policy_id}|{policy_params_hash}|
                    {graph_snapshot_hash}|{schema_version}|admissibility_token_v2_hmac"

This creates an unforgeable proof-of-authorization. Downstream services (RAG++, Orbit) can verify tokens via `POST /api/verify_token` without accessing the HMAC secret. The token binds six fields together: if any parameter is tampered with, verification fails.

Invariant (INV-GK-003): No Phantom Authority. Without a valid admissibility token, content is NOT admissible. The `AdmissibleEvidenceBundle` type enforces this at the Rust type level — it can only be constructed through the verification pathway.

1.4 Policy Governance

Context slicing is controlled by `SlicePolicyV1`, which parameterizes:

Parameter	Default	Purpose
`max_nodes`	256	Maximum turns in a slice (budget cap)
`max_radius`	10	Maximum graph hops from anchor
`phase_weights`	Synthesis=1.0, Planning=0.9, Consolidation=0.6, Debugging=0.5, Exploration=0.3	Phase importance scoring
`salience_weight`	0.3	How much turn salience affects priority
`distance_decay`	0.9	Priority loss per hop (10
`include_siblings`	true	Whether to expand to sibling turns
`max_siblings_per_node`	5	Sibling expansion limit per parent

Policies are registered in an immutable `PolicyRegistry` with hash-stable fingerprints. Policy parameter hashes use quantized floats (multiply by 10⁶, round to i64) to ensure cross-platform determinism between Rust and Python clients.

---

2. Benchmark Methodology

2.1 Test Design

We evaluated 27 queries across 5 categories against 4 retrieval methods:

Category	Queries	What It Tests
Factual Recall	6	Direct attribute lookups ("What does X use?")
Relationship	6	Dependency/integration mapping ("What depends on X?")
Multi-hop	5	2-hop graph traversal ("X → Y → Z")
Fuzzy/Semantic	5	Loose topic matching ("anything about skating")
Predicate-specific	5	Structured predicate filters ("likes", "should", "has_file")

2.2 Methods Under Test

Method	Corpus	Mechanism
Graph Kernel	2,681 structured triples	REST API queries to `/api/knowledge` with exact field filters
Keyword	Same 2,681 triples	In-memory substring matching on subject+predicate+object
BM25	Same 2,681 triples	Okapi BM25 (k₁=1.5, b=0.75) over triple corpus
RAG++	107K+ conversation turns	Vector similarity search over conversation embeddings

2.3 Metrics

Response Time (ms): Wall-clock latency including all network round-trips
Result Count: Number of results returned per query
Relevance Score (0–1): Fraction of expected terms found in results

2.4 Important Caveats

Graph Kernel and Keyword/BM25 operate on the same triple corpus (2,681 structured triples extracted from conversations).
RAG++ operates on a fundamentally different corpus (107K+ raw conversation turns with embeddings). Direct comparison is informational, not apples-to-apples.
Multi-hop queries use sequential API calls for GK (N hops = N HTTP round-trips). A server-side traversal endpoint would eliminate this latency multiplier.

---

3. Full Benchmark Results

3.1 Per-Category Results

Factual Recall

Method	Avg Latency	Avg Results	Avg Relevance
Graph Kernel	248.3 ms	3.7	1.00
Keyword	2.7 ms	20.0	1.00
BM25	9.0 ms	18.2	1.00
RAG++	421.9 ms	10.0	0.92

All triple-based methods achieve perfect relevance. GK returns precisely scoped results (3.7 avg) versus keyword's broad 20.0. Latency difference is entirely attributable to network RTT.

Relationship Queries

Method	Avg Latency	Avg Results	Avg Relevance
Graph Kernel	204.3 ms	9.5	0.94
Keyword	2.8 ms	19.3	1.00
BM25	8.7 ms	12.3	1.00
RAG++	336.4 ms	10.0	0.69

GK's 0.94 relevance drop comes from a single entity normalization failure: "GCP" not matching "Google Cloud Platform" in `deploys_to` results. With normalization, this would be 1.00.

Multi-hop Reasoning ⭐

Method	Avg Latency	Avg Results	Avg Relevance
Graph Kernel	586.6 ms	7.6	1.00
Keyword	3.3 ms	20.0	1.00
BM25	9.2 ms	18.8	1.00
RAG++	348.1 ms	10.0	0.40

This is the Graph Kernel's killer feature. While keyword/BM25 achieve identical 1.00 relevance scores, the nature of their results is fundamentally different:

GK returns 7.6 structurally connected results: Mohamed → works_on → clawdbot → uses → Gemini batch API. Each result is causally linked through verified graph edges.
Keyword returns 20 coincidence results: Documents happen to contain "Mohamed" and "clawdbot" but the system has no concept of why they co-occur.

The relevance metric masks this critical quality difference. In production, the GK's causal chain enables provenance-tracked reasoning; keyword coincidence does not.

Fuzzy/Semantic Search

Method	Avg Latency	Avg Results	Avg Relevance
Graph Kernel	215.2 ms	19.8	0.42
Keyword	2.0 ms	16.0	0.80
BM25	6.1 ms	7.6	0.53
RAG++	484.0 ms	10.0	0.65

GK's weakest category. No semantic understanding — searching for "music" won't find triples about "audio production." This is the primary motivation for the planned RAG++ integration bridge.

Predicate-Specific Queries

Method	Avg Latency	Avg Results	Avg Relevance
Graph Kernel	230.1 ms	16.0	0.80
Keyword	3.3 ms	20.0	1.00
BM25	9.5 ms	20.0	1.00
RAG++	460.8 ms	10.0	0.80

GK should excel here (exact predicate filters), but entity normalization failures suppress relevance. "Dream Weaver" files returned 0 results due to capitalization/alias mismatch.

3.2 Overall Averages

Method	Avg Latency	Avg Results	Avg Relevance	Latency Rank	Relevance Rank
Keyword	2.8 ms	19.1	0.96	🥇	🥇
BM25	8.5 ms	15.4	0.91	🥈	🥈
Graph Kernel	291.7 ms	11.0	0.84	🥉	🥉
RAG++	407.9 ms	10.0	0.70	4th	4th

3.3 Latency Decomposition

Component	Contribution
Network RTT to Supabase PostgreSQL	~180–200 ms
PostgreSQL query execution	~5–20 ms
TCP connection overhead	~10–15 ms
Rust serialization + JSON	~1–2 ms
Multi-hop per additional hop	+200 ms
Projected with local SQLite	10–30 ms total

Critical insight: The Graph Kernel is compute-efficient. Its latency problem is an architecture choice (remote Supabase), not a fundamental limitation. Migrating to SQLite with periodic Supabase sync would achieve sub-30ms queries.

---

4. Comparative Analysis: Industry Alternatives

4.1 Neo4j

Dimension	Graph Kernel	Neo4j
Architecture	Single Rust binary, SPO triple store	JVM-based, native property graph
Query Language	REST API with field filters	Cypher (full graph query language)
Multi-hop	Sequential HTTP calls (client-side)	Native MATCH traversal (server-side)
Latency	291ms (remote PG); 10–30ms (local)	1–50ms typical (local)
Context Slicing	Native (primary purpose)	❌ Must build custom
Admissibility Tokens	Native HMAC-signed	❌ No equivalent
Deployment	Single binary, ~20MB	JVM + heap (512MB–4GB+)
Cost	Free (self-hosted)	Community: free; Enterprise: $$$$
Scale	Thousands of triples	Billions of nodes/edges
Ecosystem	Purpose-built for agents	Drivers for 10+ languages, GraphQL, APOC

Where GK wins: Deterministic context slicing with cryptographic provenance in a 20MB binary. Neo4j would require a custom application layer to replicate this.

Where Neo4j wins: Query expressiveness (Cypher is vastly more powerful than REST filters), horizontal scaling (causal clustering), ecosystem maturity (15+ years), visualization tools (Bloom, Browser), and native server-side traversal.

4.2 Amazon Neptune

Dimension	Graph Kernel	Amazon Neptune
Architecture	Single Rust binary	Managed cloud service (AWS)
Query Language	REST API	SPARQL, Gremlin, openCypher
Storage Model	SPO triples + conversation DAG	Property graph or RDF, distributed
Latency	291ms / 10–30ms local	2–20ms (within VPC)
Context Slicing	Native	❌ No concept
Provenance	HMAC-signed bundles	IAM-based access control
Scale	Thousands	Billions (64TB storage)
Cost	$0 (self-hosted) \| $0.10/hr+ (starts ~$75/mo)
Ops Burden	Zero (single binary)	Managed (but VPC config, IAM)

Where GK wins: Zero cost, zero cloud dependency, purpose-built context authority with cryptographic tokens. Neptune has no concept of context windows or policy-governed slicing.

Where Neptune wins: Massive scale, managed infrastructure, multi-model (SPARQL + Gremlin + openCypher), read replicas, point-in-time recovery, IAM integration.

4.3 Apache Jena / Fuseki

Dimension	Graph Kernel	Apache Jena/Fuseki
Architecture	Rust/Axum REST service	Java, SPARQL 1.1 endpoint
Standards Compliance	Custom SPO schema	Full W3C RDF/OWL/SPARQL
Reasoning	Graph traversal only	OWL inference, RDFS entailment
Context Slicing	Native	❌ No concept
Query Power	Field filters	SPARQL (Turing-complete)
Data Model	(subject, predicate, object, confidence)	Full RDF (URIs, blank nodes, literals, named graphs)

Where GK wins: Purpose-built context slicing, HMAC provenance, lightweight deployment. Jena/Fuseki would require a custom application layer for context windows.

Where Jena wins: Standards compliance (W3C RDF/OWL), semantic reasoning (OWL inference), SPARQL query expressiveness, federated queries (SERVICE keyword), extensive tooling (TDB2, Shacl validation).

4.4 Dgraph

Dimension	Graph Kernel	Dgraph
Architecture	Single binary	Distributed (Zero, Alpha, Ratel)
Query Language	REST API	GraphQL±, DQL
Scale	Thousands	Billions (horizontally sharded)
Latency	291ms / 10–30ms	<10ms typical
Context Slicing	Native	❌ Build custom
Schema	Implicit (triple fields)	Explicit GraphQL-like schema

Where GK wins: Context slicing, provenance tracking, zero-configuration single binary. Dgraph's distributed architecture (Zero + Alpha nodes) is overkill for agent context management.

Where Dgraph wins: Horizontal scaling, GraphQL-native API, ACID transactions across shards, built-in full-text search (Bleve), authorization rules (@auth directives).

4.5 TypeDB

Dimension	Graph Kernel	TypeDB
Architecture	Rust binary, SPO triples	Java, hypergraph with type system
Data Model	Flat triples	Entities, relations, attributes with subtypes, roles, rules
Reasoning	Graph traversal	Native rule-based inference
Query Language	REST filters	TypeQL (pattern-matching)
Context Slicing	Native	❌ No concept

Where GK wins: Lightweight deployment, context-slicing-as-a-service, HMAC provenance. TypeDB's rich type system is unnecessary for the context authority use case.

Where TypeDB wins: Expressive data modeling (hyper-relations, type hierarchies, role-playing), native reasoning (if A teaches B, and B is a course, then A is a teacher), schema enforcement, pattern-matching queries.

4.6 Weaviate

Dimension	Graph Kernel	Weaviate
Architecture	Rust, deterministic graph	Go, vector-first database
Search	Exact field matching	Hybrid (vector + BM25 + filters)
Embeddings	None	Native (text2vec, multi2vec)
Context Slicing	Native	❌ No concept
Semantic Search	❌ (0.42 fuzzy relevance)	✅ Core competency
Multi-hop	✅ Structural traversal	Cross-references (limited)

Where GK wins: Deterministic context slicing, structural multi-hop reasoning, cryptographic provenance. Weaviate cannot provide reproducible, policy-governed context windows.

Where Weaviate wins: Semantic search (the GK's primary weakness), hybrid search combining vectors with BM25, built-in vectorization modules, generative search (RAG-native), multi-tenancy.

4.7 LangChain / LlamaIndex Knowledge Graphs

Dimension	Graph Kernel	LangChain/LlamaIndex KG
Architecture	Purpose-built Rust binary	Python orchestration over external stores
Implementation	Native graph engine	Wrapper around Neo4j/Nebula/etc.
Context Window	Deterministic policy-governed slicing	Prompt stuffing with retrieval results
Provenance	HMAC-signed, fingerprinted	❌ No formal provenance
Determinism	Guaranteed (same input → same output)	Non-deterministic (LLM-dependent extraction)
LLM Integration	Via downstream consumers	Native (chains, agents, tools)

Where GK wins: Determinism, provenance, security model. LangChain/LlamaIndex knowledge graphs are LLM-dependent for entity extraction and have no formal reproducibility guarantees. The GK provides a foundation that these tools could consume.

Where LC/LI wins: Rapid prototyping, LLM-native pipelines, rich ecosystem of document loaders/splitters/embedders, community support, flexibility to swap backends.

4.8 Microsoft GraphRAG

Dimension	Graph Kernel	Microsoft GraphRAG
Architecture	Rust binary, triple store + context slicer	Python, LLM-driven graph construction
Graph Construction	Deterministic extraction (Kimi-K2)	LLM-based entity/relationship extraction
Query Modes	Exact field filters + structural traversal	Local search (entity-centric) + Global search (community summaries)
Community Detection	❌ Not implemented	Leiden algorithm, hierarchical communities
Summarization	Raw triple retrieval	LLM-generated community summaries
Context Slicing	Native, policy-governed	❌ Uses prompt engineering
Provenance	HMAC-signed tokens	❌ No formal provenance

Where GK wins: Deterministic provenance, cryptographic trust, lightweight deployment, no LLM dependency for query execution. GraphRAG requires LLM calls for both indexing and querying.

Where GraphRAG wins: Holistic corpus understanding via community summaries, handles "what is the dataset about?" queries that GK cannot answer, sophisticated graph construction from unstructured text, hierarchical community-level reasoning.

4.9 Zep

Dimension	Graph Kernel	Zep
Architecture	Rust, context slicing + triple store	Go/Python, memory layer for LLM apps
Purpose	Deterministic context authority	Long-term memory for chatbots
Memory Model	Structured triples + conversation DAG	Session memory, facts, summaries
Entity Handling	SPO triples with confidence	Automatic entity extraction + graph
Context Window	Policy-governed, HMAC-signed	Automatic relevance-based selection
Provenance	Full cryptographic chain	❌ No formal provenance

Where GK wins: Deterministic reproducibility, cryptographic provenance, policy governance. Zep optimizes for developer experience at the cost of determinism guarantees.

Where Zep wins: Developer experience (drop-in memory for any LLM app), automatic entity extraction, temporal awareness (memory decay, summarization), user-level memory management, managed cloud offering.

4.10 Comparative Summary Matrix

System	Context Slicing	Provenance	Multi-hop	Semantic Search	Scale	Deployment	Cost
Graph Kernel	✅ Native	✅ HMAC	✅	❌	Small	Single binary	Free
Neo4j	❌ Custom	❌	✅✅	❌	Large	JVM	Free/$$$
Neptune	❌	❌	✅✅	❌	Massive	AWS managed	$$$
Jena/Fuseki	❌	Partial	✅	❌	Medium	JVM	Free
Dgraph	❌	❌	✅✅	✅ (Bleve)	Massive	Distributed	Free/$$
TypeDB	❌	❌	✅✅	❌	Large	JVM	Free
Weaviate	❌	❌	Limited	✅✅	Large	Go binary	Free/$$
LC/LI KG	❌	❌	✅	✅	Varies	Python	Free+LLM
GraphRAG	❌	❌	✅	✅✅	Medium	Python	Free+LLM
Zep	❌	❌	✅	✅	Medium	Go/Cloud	Free/$$

---

5. Multi-hop Reasoning: Deep Dive

5.1 Why Multi-hop Is the Killer Feature

Multi-hop reasoning follows actual relationship chains through the knowledge graph:

Query: "What tools does Mohamed use indirectly through his projects?"

Graph Kernel (structural):
  Mohamed → works_on → clawdbot
  clawdbot → uses → Gemini batch API
  clawdbot → uses → Discord.js
  Mohamed → works_on → Comp-Core
  Comp-Core → uses → Rust/Axum
  Comp-Core → uses → Supabase PostgreSQL

  Result: 7 structurally connected triples forming two 2-hop chains

Keyword (coincidence):
  "Mohamed" appears in 20 triples (likes, wants_to, needs_to, ...)
  "uses" appears in 50 triples (unrelated subjects)
  Intersection: 20 results containing "Mohamed" — many irrelevant

  Result: 20 results, some relevant by coincidence, no causal connection

The relevance metric assigns both methods 1.00 because the expected terms appear. But the Graph Kernel's results constitute a knowledge chain — a directed path through verified relationships. The keyword results are a coincidence pile — documents that happen to contain matching substrings.

5.2 Latency Implications

Multi-hop queries currently require sequential HTTP round-trips:

2-hop query = 3 HTTP calls × ~200ms RTT = ~600ms
3-hop query = 4 HTTP calls × ~200ms RTT = ~800ms

With a server-side traversal endpoint (`POST /api/knowledge/traverse`), this collapses to a single HTTP call:

2-hop query = 1 HTTP call + server-side SQL joins = ~220ms (remote) or ~15ms (local)

This is the highest-priority improvement for multi-hop performance.

---

6. Entity Normalization Analysis

6.1 Current State

Metric	Value
Raw unique subjects	169
After normalization	132
Identified duplicates	123 triple pairs
Fragmentation rate	22

6.2 Representative Alias Clusters

Canonical Entity	Known Aliases
`dream-weaver-engine`	Dream Weaver, dream weaver, DreamWeaver, Dream-weaver-engine
`clawdbot`	Clawdbot, ClawdBot, clawdbot-gateway
`mohamed-diomande`	Mohamed Diomande, Mohameddiomande, mohameddiomande, Mohamed
`rag-plusplus`	RAG++, RAG++ (Cloud), rag-plusplus, rag-plusplus-core
`comp-core`	Comp-Core, CompCore, comp core

6.3 Impact on Query Quality

Entity normalization failures directly suppress relevance:

Relationship queries: 0.94 → 1.00 (projected) after normalization
Predicate-specific: 0.80 → 0.95+ after normalization
Overall average: 0.84 → 0.92+ after normalization

The implemented entity normalizer (`scripts/entity-normalizer.py`) uses a canonical alias table with fuzzy matching to resolve these at query time and ingestion time.

---

7. RAG++ Integration Architecture

7.1 The Hybrid Model

The Graph Kernel and RAG++ are complementary, not competing:

┌──────────────────────────────────────────────────────────────┐
│                    HYBRID RETRIEVAL                           │
│                                                              │
│  User Query: "What AI tools does the system use?"            │
│                                                              │
│  ┌──────────────────┐      ┌──────────────────┐            │
│  │  RAG++ Path       │      │  Graph Kernel     │            │
│  │  (Semantic)       │      │  (Structural)     │            │
│  │                   │      │                   │            │
│  │  Vector embedding │      │  Subject: "comp-  │            │
│  │  → similarity     │      │   core"           │            │
│  │  → top-K turns    │      │  Predicate: "uses"│            │
│  │                   │      │  → exact triples  │            │
│  │  Finds: context   │      │  → follow edges   │            │
│  │  around AI tools  │      │  → causal chains  │            │
│  └────────┬─────────┘      └────────┬─────────┘            │
│           │                         │                        │
│           └────────────┬────────────┘                        │
│                        ▼                                     │
│              ┌──────────────────┐                            │
│              │  Merge & Rank    │                            │
│              │  - Deduplicate   │                            │
│              │  - Cross-enrich  │                            │
│              │  - Rank by       │                            │
│              │    structure +   │                            │
│              │    similarity    │                            │
│              └──────────────────┘                            │
│                        │                                     │
│                        ▼                                     │
│              Enriched Result Set                             │
│              (semantic + structural + provenance)            │
└──────────────────────────────────────────────────────────────┘

7.2 Integration Endpoints (Planned)

POST /api/enrich
{
  "rag_results": [...],          // From RAG++ /api/rag/search
  "max_hops": 2,
  "include_predicates": ["uses", "depends_on", "integrates_with"]
}

→ Returns enriched results with graph context:
  - Original RAG++ result (semantic match)
  - Entities found in result text
  - Graph relationships for those entities (1-2 hops)
  - Related entities discovered through traversal

This bridge transforms RAG++ from a flat similarity search into a structured reasoning system while preserving the Graph Kernel's provenance guarantees.

---

8. Corpus Statistics

8.1 Knowledge Graph State (Post-Topology Ingestion)

Metric	Value
Total triples	3,502
Unique subjects	221
Unique predicates	88
Data sources	kimi-k2-extraction, topology-ingester, unknown
Average confidence	0.73 (Kimi), 0.95 (topology)

8.2 Top 10 Predicates

Predicate	Count
`has_file`	810
`needs_to`	467
`has_path`	383
`should`	332
`likes`	224
`wants_to`	178
`is_a`	111
`uses`	107
`works_on`	87
`building`	81

8.3 Predicate Taxonomy

The predicates reveal a natural clustering:

Structural: `is_a`, `has_file`, `has_path`, `uses`, `depends_on`, `integrates_with` (40
Intentional: `needs_to`, `should`, `wants_to`, `building` (34
Relational: `works_on`, `likes`, `deployed_on` (12
Descriptive: `full_name`, `code`, `port` (14

---

9. Roadmap

### Phase 1: SQLite Migration (Immediate)
- Goal: Reduce query latency from 291ms to <30ms
- Approach: Local SQLite cache with periodic Supabase sync
- Status: ✅ Implemented (`scripts/sqlite-mirror.py`, `scripts/gk-proxy.py` on port 8002)
- Impact: Latency competitive with BM25

### Phase 2: Entity Normalization (Immediate)
- Goal: Increase relevance from 0.84 to 0.95+
- Approach: Canonical alias table with fuzzy matching at query and ingestion time
- Status: ✅ Implemented (`scripts/entity-normalizer.py`)
- Impact: Eliminates alias-driven query failures

### Phase 3: Server-Side Traversal API (Short-term)
- Goal: Collapse multi-hop from N×200ms to single call
- Approach: `POST /api/knowledge/traverse` with server-side BFS
- Status: 📋 Planned
- Impact: Multi-hop latency drops from 600ms to ~220ms (remote) or ~15ms (local)

### Phase 4: RAG++ Integration Bridge (Medium-term)
- Goal: Combine structural reasoning with semantic search
- Approach: `/api/enrich` endpoint bridging GK and RAG++
- Status: 📋 Planned
- Impact: Addresses fuzzy/semantic weakness while preserving provenance

### Phase 5: Compass Visualization (Medium-term)
- Goal: Visual exploration of the knowledge graph
- Approach: Web-based graph visualization (D3/Cytoscape.js)
- Status: 📋 Planned
- Impact: Enables intuitive knowledge exploration

### Phase 6: Federated Graph (Long-term)
- Goal: Distributed knowledge across multiple agents
- Approach: Graph federation protocol with cross-kernel queries
- Status: 📋 Research phase
- Impact: Multi-agent knowledge sharing with provenance chains

---

10. Conclusions

The OpenClaw Graph Kernel occupies a unique position in the knowledge infrastructure landscape. It is not the fastest general-purpose graph database, nor the most expressive query engine, nor the most scalable distributed store. It is, however, the only system we evaluated that provides deterministic, policy-governed, cryptographically-signed context windows purpose-built for autonomous AI agent systems.

When to Use the Graph Kernel

Use Case	Recommendation
Reproducible context windows	✅ Use GK — its primary purpose
Auditable provenance chains	✅ Use GK — HMAC-signed tokens
Multi-hop relationship reasoning	✅ Use GK — structurally connected results
Dependency analysis (X → uses → Y)	✅ Use GK — precise structural queries
Fuzzy/semantic search	❌ Use RAG++ (vector similarity)
Speed-critical autocomplete	❌ Use keyword/BM25 (in-memory, <10ms)
Billion-scale graph queries	❌ Use Neo4j/Neptune/Dgraph
Standards-compliant RDF/SPARQL	❌ Use Jena/Fuseki

Final Assessment

Criterion	Rating
As a general-purpose search engine	❌ Not competitive
As a knowledge graph query engine	⚠️ Adequate, improving
As a deterministic context slicer	✅ Irreplaceable
As part of the CompCore stack	✅ Essential
As a lightweight agent knowledge store	✅ Excellent value/complexity ratio

The Graph Kernel's value is not captured by standard information retrieval benchmarks. Its contribution is to the provenance infrastructure of autonomous agent systems: ensuring that every downstream decision can be traced to a specific, reproducible, verifiable context window. No general-purpose database provides this out of the box.

---

Appendix A: Test Environment

Component	Specification
Machine	MacBook Air (Apple Silicon M3, arm64)
OS	Darwin 24.6.0
Graph Kernel	Rust binary, cc-graph-kernel v0.1.0
RAG++	Python (FastAPI), v0.1.0
Database	Supabase PostgreSQL (remote, us-east-1)
Network	Home broadband (~200ms RTT to Supabase)
Benchmark Script	`benchmarks/run_benchmark.py`
Raw Results	`/tmp/benchmark_results.json`

Appendix B: Graph Kernel API Reference (Summary)

Endpoint	Method	Purpose
`POST /api/slice`	POST	Generate a context slice around an anchor turn
`POST /api/slice/batch`	POST	Generate multiple slices in batch
`POST /api/verify_token`	POST	Verify an admissibility token
`GET /api/policies`	GET	List registered slice policies
`POST /api/policies`	POST	Register a new slice policy
`GET /api/knowledge`	GET	Query knowledge triples
`POST /api/knowledge`	POST	Add a single knowledge triple
`POST /api/knowledge/batch`	POST	Add triples in batch
`GET /api/knowledge/stats`	GET	Get knowledge graph statistics
`GET /health`	GET	Detailed health check
`GET /health/live`	GET	Liveness probe
`GET /health/ready`	GET	Readiness probe
`GET /health/startup`	GET	Startup probe

---

Report generated 2026-02-13. Graph Kernel commit: HEAD of cc-graph-kernel, schema v1.0.0.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/docs/GRAPH-KERNEL-EVALUATION-REPORT.md

Detected Structure

Method · Evaluation · References · Math · Code Anchors · Architecture