Graph Kernel Comprehensive Evaluation Report
**OpenClaw CompCore — Technical Evaluation** **Version:** 1.0.0 · **Date:** 2026-02-13 **Authors:** Mohamed Diomande, OpenClaw Research **Classification:** Internal Technical Report
Full Public Reader
Graph Kernel Comprehensive Evaluation Report
OpenClaw CompCore — Technical Evaluation
Version: 1.0.0 · Date: 2026-02-13
Authors: Mohamed Diomande, OpenClaw Research
Classification: Internal Technical Report
---
Executive Summary
The OpenClaw Graph Kernel (GK) is a deterministic context slicing engine implemented as a single Rust binary (Axum/Tokio) that serves a dual purpose: (1) constructing reproducible, policy-governed, HMAC-signed context windows for autonomous AI agents, and (2) operating as a lightweight knowledge graph triple store over a PostgreSQL backend.
We evaluated the Graph Kernel against three baseline retrieval methods (keyword search, BM25, and RAG++ vector similarity) across 27 queries spanning five categories. Additionally, we performed an extensive comparative analysis against nine industry-grade graph databases, knowledge graph frameworks, and RAG orchestrators: Neo4j, Amazon Neptune, Apache Jena/Fuseki, Dgraph, TypeDB, Weaviate, LangChain/LlamaIndex Knowledge Graphs, Microsoft GraphRAG, and Zep.
Key Findings
1. Context Slicing is Irreplaceable. No evaluated alternative provides deterministic, HMAC-signed, policy-governed context window construction. This is the Graph Kernel's unique value proposition and cannot be replicated by bolting features onto general-purpose graph databases.
2. Multi-hop Reasoning Achieves Perfect Relevance. The GK achieves 1.00 relevance on multi-hop traversal queries, returning structurally connected knowledge chains rather than keyword-coincidence result sets. This is qualitatively distinct from high relevance scores achieved by text-matching baselines.
3. Latency is Network-Dominated, Not Compute-Bound. At 291.7ms average response time, 90
4. Semantic Search is the Primary Gap. With 0.42 average relevance on fuzzy/semantic queries, the GK lacks embedding-based similarity. The planned RAG++ integration bridge addresses this by combining structural reasoning with vector similarity.
5. Entity Normalization Fragments Knowledge. 169 raw subjects collapse to 132 canonical entities with 123 identified duplicates (e.g., "Dream Weaver" ≠ "dream-weaver-engine"). This normalization gap suppresses relationship query relevance from a theoretical 1.00 to the measured 0.94.
Verdict: The Graph Kernel justifies its operational complexity as the provenance and context authority layer in the CompCore stack. It is not a general-purpose search engine and should not be evaluated as one.
---
1. Architecture Deep Dive
1.1 System Design
The Graph Kernel is a Rust binary (~15 KLOC) built on the Axum web framework with Tokio async runtime. It compiles to a single statically-linked binary that can be deployed as a local service, Docker container, or Google Cloud Run instance.
┌──────────────────────────────────────────────────────────────────┐
│ GRAPH KERNEL SERVICE │
│ │
│ ┌─────────────┐ ┌─────────────────┐ ┌──────────────────────┐ │
│ │ API Layer │ │ Core Engine │ │ Storage Layer │ │
│ │ (Axum) │ │ │ │ │ │
│ │ │ │ ContextSlicer │ │ PostgresGraphStore │ │
│ │ /api/slice │→│ PolicyRegistry │→│ (sqlx, pool=2..10) │ │
│ │ /api/verify │ │ TokenAuthority │ │ │ │
│ │ /api/knowledge│ │ SnapshotHash │ │ InMemoryGraphStore │ │
│ │ /health/* │ │ │ │ (testing only) │ │
│ └─────────────┘ └─────────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐│
│ │ Observability: Structured JSON logs, Cloud Trace, CORS ││
│ └──────────────────────────────────────────────────────────────┘│
└──────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ PostgreSQL │
│ (Supabase, remote) │
│ knowledge_graph │
│ memory_turns │
│ conversations │
│ edges │
└─────────────────────┘1.2 Dual-Purpose Design
The Graph Kernel serves two distinct functions:
Primary: Deterministic Context Slicing
- BFS expansion from an anchor turn through the conversation DAG
- Phase-weighted priority scoring (Synthesis > Planning > Consolidation > Debugging > Exploration)
- Budget-bounded slice construction (max_nodes, max_radius)
- HMAC-SHA256 signed admissibility tokens for downstream trust verification
- xxHash64-based slice fingerprinting for reproducibility proofs
Secondary: Knowledge Graph Triple Store
- Subject–Predicate–Object triples with confidence scores and source provenance
- REST API for CRUD operations on knowledge triples
- Batch ingestion endpoint for pipeline-driven knowledge extraction
- Statistics and query endpoints for graph exploration
1.3 Security Model: HMAC-Signed Admissibility
The Graph Kernel implements a cryptographic trust boundary using HMAC-SHA256:
[sensitive field redacted], canonical_string)[0..16]
canonical_string = "{slice_id}|{anchor_turn_id}|{policy_id}|{policy_params_hash}|
{graph_snapshot_hash}|{schema_version}|admissibility_token_v2_hmac"This creates an unforgeable proof-of-authorization. Downstream services (RAG++, Orbit) can verify tokens via `POST /api/verify_token` without accessing the HMAC secret. The token binds six fields together: if any parameter is tampered with, verification fails.
Invariant (INV-GK-003): No Phantom Authority. Without a valid admissibility token, content is NOT admissible. The `AdmissibleEvidenceBundle` type enforces this at the Rust type level — it can only be constructed through the verification pathway.
1.4 Policy Governance
Context slicing is controlled by `SlicePolicyV1`, which parameterizes:
| Parameter | Default | Purpose |
|---|---|---|
| `max_nodes` | 256 | Maximum turns in a slice (budget cap) |
| `max_radius` | 10 | Maximum graph hops from anchor |
| `phase_weights` | Synthesis=1.0, Planning=0.9, Consolidation=0.6, Debugging=0.5, Exploration=0.3 | Phase importance scoring |
| `salience_weight` | 0.3 | How much turn salience affects priority |
| `distance_decay` | 0.9 | Priority loss per hop (10 |
| `include_siblings` | true | Whether to expand to sibling turns |
| `max_siblings_per_node` | 5 | Sibling expansion limit per parent |
Policies are registered in an immutable `PolicyRegistry` with hash-stable fingerprints. Policy parameter hashes use quantized floats (multiply by 10⁶, round to i64) to ensure cross-platform determinism between Rust and Python clients.
---
2. Benchmark Methodology
2.1 Test Design
We evaluated 27 queries across 5 categories against 4 retrieval methods:
| Category | Queries | What It Tests |
|---|---|---|
| Factual Recall | 6 | Direct attribute lookups ("What does X use?") |
| Relationship | 6 | Dependency/integration mapping ("What depends on X?") |
| Multi-hop | 5 | 2-hop graph traversal ("X → Y → Z") |
| Fuzzy/Semantic | 5 | Loose topic matching ("anything about skating") |
| Predicate-specific | 5 | Structured predicate filters ("likes", "should", "has_file") |
2.2 Methods Under Test
| Method | Corpus | Mechanism |
|---|---|---|
| Graph Kernel | 2,681 structured triples | REST API queries to `/api/knowledge` with exact field filters |
| Keyword | Same 2,681 triples | In-memory substring matching on subject+predicate+object |
| BM25 | Same 2,681 triples | Okapi BM25 (k₁=1.5, b=0.75) over triple corpus |
| RAG++ | 107K+ conversation turns | Vector similarity search over conversation embeddings |
2.3 Metrics
- Response Time (ms): Wall-clock latency including all network round-trips
- Result Count: Number of results returned per query
- Relevance Score (0–1): Fraction of expected terms found in results
2.4 Important Caveats
- Graph Kernel and Keyword/BM25 operate on the same triple corpus (2,681 structured triples extracted from conversations).
- RAG++ operates on a fundamentally different corpus (107K+ raw conversation turns with embeddings). Direct comparison is informational, not apples-to-apples.
- Multi-hop queries use sequential API calls for GK (N hops = N HTTP round-trips). A server-side traversal endpoint would eliminate this latency multiplier.
---
3. Full Benchmark Results
3.1 Per-Category Results
Factual Recall
| Method | Avg Latency | Avg Results | Avg Relevance |
|---|---|---|---|
| Graph Kernel | 248.3 ms | 3.7 | 1.00 |
| Keyword | 2.7 ms | 20.0 | 1.00 |
| BM25 | 9.0 ms | 18.2 | 1.00 |
| RAG++ | 421.9 ms | 10.0 | 0.92 |
All triple-based methods achieve perfect relevance. GK returns precisely scoped results (3.7 avg) versus keyword's broad 20.0. Latency difference is entirely attributable to network RTT.
Relationship Queries
| Method | Avg Latency | Avg Results | Avg Relevance |
|---|---|---|---|
| Graph Kernel | 204.3 ms | 9.5 | 0.94 |
| Keyword | 2.8 ms | 19.3 | 1.00 |
| BM25 | 8.7 ms | 12.3 | 1.00 |
| RAG++ | 336.4 ms | 10.0 | 0.69 |
GK's 0.94 relevance drop comes from a single entity normalization failure: "GCP" not matching "Google Cloud Platform" in `deploys_to` results. With normalization, this would be 1.00.
Multi-hop Reasoning ⭐
| Method | Avg Latency | Avg Results | Avg Relevance |
|---|---|---|---|
| Graph Kernel | 586.6 ms | 7.6 | 1.00 |
| Keyword | 3.3 ms | 20.0 | 1.00 |
| BM25 | 9.2 ms | 18.8 | 1.00 |
| RAG++ | 348.1 ms | 10.0 | 0.40 |
This is the Graph Kernel's killer feature. While keyword/BM25 achieve identical 1.00 relevance scores, the nature of their results is fundamentally different:
- GK returns 7.6 structurally connected results: Mohamed → works_on → clawdbot → uses → Gemini batch API. Each result is causally linked through verified graph edges.
- Keyword returns 20 coincidence results: Documents happen to contain "Mohamed" and "clawdbot" but the system has no concept of why they co-occur.
The relevance metric masks this critical quality difference. In production, the GK's causal chain enables provenance-tracked reasoning; keyword coincidence does not.
Fuzzy/Semantic Search
| Method | Avg Latency | Avg Results | Avg Relevance |
|---|---|---|---|
| Graph Kernel | 215.2 ms | 19.8 | 0.42 |
| Keyword | 2.0 ms | 16.0 | 0.80 |
| BM25 | 6.1 ms | 7.6 | 0.53 |
| RAG++ | 484.0 ms | 10.0 | 0.65 |
GK's weakest category. No semantic understanding — searching for "music" won't find triples about "audio production." This is the primary motivation for the planned RAG++ integration bridge.
Predicate-Specific Queries
| Method | Avg Latency | Avg Results | Avg Relevance |
|---|---|---|---|
| Graph Kernel | 230.1 ms | 16.0 | 0.80 |
| Keyword | 3.3 ms | 20.0 | 1.00 |
| BM25 | 9.5 ms | 20.0 | 1.00 |
| RAG++ | 460.8 ms | 10.0 | 0.80 |
GK should excel here (exact predicate filters), but entity normalization failures suppress relevance. "Dream Weaver" files returned 0 results due to capitalization/alias mismatch.
3.2 Overall Averages
| Method | Avg Latency | Avg Results | Avg Relevance | Latency Rank | Relevance Rank |
|---|---|---|---|---|---|
| Keyword | 2.8 ms | 19.1 | 0.96 | 🥇 | 🥇 |
| BM25 | 8.5 ms | 15.4 | 0.91 | 🥈 | 🥈 |
| Graph Kernel | 291.7 ms | 11.0 | 0.84 | 🥉 | 🥉 |
| RAG++ | 407.9 ms | 10.0 | 0.70 | 4th | 4th |
3.3 Latency Decomposition
| Component | Contribution |
|---|---|
| Network RTT to Supabase PostgreSQL | ~180–200 ms |
| PostgreSQL query execution | ~5–20 ms |
| TCP connection overhead | ~10–15 ms |
| Rust serialization + JSON | ~1–2 ms |
| Multi-hop per additional hop | +200 ms |
| Projected with local SQLite | 10–30 ms total |
Critical insight: The Graph Kernel is compute-efficient. Its latency problem is an architecture choice (remote Supabase), not a fundamental limitation. Migrating to SQLite with periodic Supabase sync would achieve sub-30ms queries.
---
4. Comparative Analysis: Industry Alternatives
4.1 Neo4j
| Dimension | Graph Kernel | Neo4j |
|---|---|---|
| Architecture | Single Rust binary, SPO triple store | JVM-based, native property graph |
| Query Language | REST API with field filters | Cypher (full graph query language) |
| Multi-hop | Sequential HTTP calls (client-side) | Native MATCH traversal (server-side) |
| Latency | 291ms (remote PG); 10–30ms (local) | 1–50ms typical (local) |
| Context Slicing | Native (primary purpose) | ❌ Must build custom |
| Admissibility Tokens | Native HMAC-signed | ❌ No equivalent |
| Deployment | Single binary, ~20MB | JVM + heap (512MB–4GB+) |
| Cost | Free (self-hosted) | Community: free; Enterprise: $$$$ |
| Scale | Thousands of triples | Billions of nodes/edges |
| Ecosystem | Purpose-built for agents | Drivers for 10+ languages, GraphQL, APOC |
Where GK wins: Deterministic context slicing with cryptographic provenance in a 20MB binary. Neo4j would require a custom application layer to replicate this.
Where Neo4j wins: Query expressiveness (Cypher is vastly more powerful than REST filters), horizontal scaling (causal clustering), ecosystem maturity (15+ years), visualization tools (Bloom, Browser), and native server-side traversal.
4.2 Amazon Neptune
| Dimension | Graph Kernel | Amazon Neptune |
|---|---|---|
| Architecture | Single Rust binary | Managed cloud service (AWS) |
| Query Language | REST API | SPARQL, Gremlin, openCypher |
| Storage Model | SPO triples + conversation DAG | Property graph or RDF, distributed |
| Latency | 291ms / 10–30ms local | 2–20ms (within VPC) |
| Context Slicing | Native | ❌ No concept |
| Provenance | HMAC-signed bundles | IAM-based access control |
| Scale | Thousands | Billions (64TB storage) |
| Cost | $0 (self-hosted) | $0.10/hr+ (starts ~$75/mo) | |
| Ops Burden | Zero (single binary) | Managed (but VPC config, IAM) |
Where GK wins: Zero cost, zero cloud dependency, purpose-built context authority with cryptographic tokens. Neptune has no concept of context windows or policy-governed slicing.
Where Neptune wins: Massive scale, managed infrastructure, multi-model (SPARQL + Gremlin + openCypher), read replicas, point-in-time recovery, IAM integration.
4.3 Apache Jena / Fuseki
| Dimension | Graph Kernel | Apache Jena/Fuseki |
|---|---|---|
| Architecture | Rust/Axum REST service | Java, SPARQL 1.1 endpoint |
| Standards Compliance | Custom SPO schema | Full W3C RDF/OWL/SPARQL |
| Reasoning | Graph traversal only | OWL inference, RDFS entailment |
| Context Slicing | Native | ❌ No concept |
| Query Power | Field filters | SPARQL (Turing-complete) |
| Data Model | (subject, predicate, object, confidence) | Full RDF (URIs, blank nodes, literals, named graphs) |
Where GK wins: Purpose-built context slicing, HMAC provenance, lightweight deployment. Jena/Fuseki would require a custom application layer for context windows.
Where Jena wins: Standards compliance (W3C RDF/OWL), semantic reasoning (OWL inference), SPARQL query expressiveness, federated queries (SERVICE keyword), extensive tooling (TDB2, Shacl validation).
4.4 Dgraph
| Dimension | Graph Kernel | Dgraph |
|---|---|---|
| Architecture | Single binary | Distributed (Zero, Alpha, Ratel) |
| Query Language | REST API | GraphQL±, DQL |
| Scale | Thousands | Billions (horizontally sharded) |
| Latency | 291ms / 10–30ms | <10ms typical |
| Context Slicing | Native | ❌ Build custom |
| Schema | Implicit (triple fields) | Explicit GraphQL-like schema |
Where GK wins: Context slicing, provenance tracking, zero-configuration single binary. Dgraph's distributed architecture (Zero + Alpha nodes) is overkill for agent context management.
Where Dgraph wins: Horizontal scaling, GraphQL-native API, ACID transactions across shards, built-in full-text search (Bleve), authorization rules (@auth directives).
4.5 TypeDB
| Dimension | Graph Kernel | TypeDB |
|---|---|---|
| Architecture | Rust binary, SPO triples | Java, hypergraph with type system |
| Data Model | Flat triples | Entities, relations, attributes with subtypes, roles, rules |
| Reasoning | Graph traversal | Native rule-based inference |
| Query Language | REST filters | TypeQL (pattern-matching) |
| Context Slicing | Native | ❌ No concept |
Where GK wins: Lightweight deployment, context-slicing-as-a-service, HMAC provenance. TypeDB's rich type system is unnecessary for the context authority use case.
Where TypeDB wins: Expressive data modeling (hyper-relations, type hierarchies, role-playing), native reasoning (if A teaches B, and B is a course, then A is a teacher), schema enforcement, pattern-matching queries.
4.6 Weaviate
| Dimension | Graph Kernel | Weaviate |
|---|---|---|
| Architecture | Rust, deterministic graph | Go, vector-first database |
| Search | Exact field matching | Hybrid (vector + BM25 + filters) |
| Embeddings | None | Native (text2vec, multi2vec) |
| Context Slicing | Native | ❌ No concept |
| Semantic Search | ❌ (0.42 fuzzy relevance) | ✅ Core competency |
| Multi-hop | ✅ Structural traversal | Cross-references (limited) |
Where GK wins: Deterministic context slicing, structural multi-hop reasoning, cryptographic provenance. Weaviate cannot provide reproducible, policy-governed context windows.
Where Weaviate wins: Semantic search (the GK's primary weakness), hybrid search combining vectors with BM25, built-in vectorization modules, generative search (RAG-native), multi-tenancy.
4.7 LangChain / LlamaIndex Knowledge Graphs
| Dimension | Graph Kernel | LangChain/LlamaIndex KG |
|---|---|---|
| Architecture | Purpose-built Rust binary | Python orchestration over external stores |
| Implementation | Native graph engine | Wrapper around Neo4j/Nebula/etc. |
| Context Window | Deterministic policy-governed slicing | Prompt stuffing with retrieval results |
| Provenance | HMAC-signed, fingerprinted | ❌ No formal provenance |
| Determinism | Guaranteed (same input → same output) | Non-deterministic (LLM-dependent extraction) |
| LLM Integration | Via downstream consumers | Native (chains, agents, tools) |
Where GK wins: Determinism, provenance, security model. LangChain/LlamaIndex knowledge graphs are LLM-dependent for entity extraction and have no formal reproducibility guarantees. The GK provides a foundation that these tools could consume.
Where LC/LI wins: Rapid prototyping, LLM-native pipelines, rich ecosystem of document loaders/splitters/embedders, community support, flexibility to swap backends.
4.8 Microsoft GraphRAG
| Dimension | Graph Kernel | Microsoft GraphRAG |
|---|---|---|
| Architecture | Rust binary, triple store + context slicer | Python, LLM-driven graph construction |
| Graph Construction | Deterministic extraction (Kimi-K2) | LLM-based entity/relationship extraction |
| Query Modes | Exact field filters + structural traversal | Local search (entity-centric) + Global search (community summaries) |
| Community Detection | ❌ Not implemented | Leiden algorithm, hierarchical communities |
| Summarization | Raw triple retrieval | LLM-generated community summaries |
| Context Slicing | Native, policy-governed | ❌ Uses prompt engineering |
| Provenance | HMAC-signed tokens | ❌ No formal provenance |
Where GK wins: Deterministic provenance, cryptographic trust, lightweight deployment, no LLM dependency for query execution. GraphRAG requires LLM calls for both indexing and querying.
Where GraphRAG wins: Holistic corpus understanding via community summaries, handles "what is the dataset about?" queries that GK cannot answer, sophisticated graph construction from unstructured text, hierarchical community-level reasoning.
4.9 Zep
| Dimension | Graph Kernel | Zep |
|---|---|---|
| Architecture | Rust, context slicing + triple store | Go/Python, memory layer for LLM apps |
| Purpose | Deterministic context authority | Long-term memory for chatbots |
| Memory Model | Structured triples + conversation DAG | Session memory, facts, summaries |
| Entity Handling | SPO triples with confidence | Automatic entity extraction + graph |
| Context Window | Policy-governed, HMAC-signed | Automatic relevance-based selection |
| Provenance | Full cryptographic chain | ❌ No formal provenance |
Where GK wins: Deterministic reproducibility, cryptographic provenance, policy governance. Zep optimizes for developer experience at the cost of determinism guarantees.
Where Zep wins: Developer experience (drop-in memory for any LLM app), automatic entity extraction, temporal awareness (memory decay, summarization), user-level memory management, managed cloud offering.
4.10 Comparative Summary Matrix
| System | Context Slicing | Provenance | Multi-hop | Semantic Search | Scale | Deployment | Cost |
|---|---|---|---|---|---|---|---|
| Graph Kernel | ✅ Native | ✅ HMAC | ✅ | ❌ | Small | Single binary | Free |
| Neo4j | ❌ Custom | ❌ | ✅✅ | ❌ | Large | JVM | Free/$$$ |
| Neptune | ❌ | ❌ | ✅✅ | ❌ | Massive | AWS managed | $$$ |
| Jena/Fuseki | ❌ | Partial | ✅ | ❌ | Medium | JVM | Free |
| Dgraph | ❌ | ❌ | ✅✅ | ✅ (Bleve) | Massive | Distributed | Free/$$ |
| TypeDB | ❌ | ❌ | ✅✅ | ❌ | Large | JVM | Free |
| Weaviate | ❌ | ❌ | Limited | ✅✅ | Large | Go binary | Free/$$ |
| LC/LI KG | ❌ | ❌ | ✅ | ✅ | Varies | Python | Free+LLM |
| GraphRAG | ❌ | ❌ | ✅ | ✅✅ | Medium | Python | Free+LLM |
| Zep | ❌ | ❌ | ✅ | ✅ | Medium | Go/Cloud | Free/$$ |
---
5. Multi-hop Reasoning: Deep Dive
5.1 Why Multi-hop Is the Killer Feature
Multi-hop reasoning follows actual relationship chains through the knowledge graph:
Query: "What tools does Mohamed use indirectly through his projects?"
Graph Kernel (structural):
Mohamed → works_on → clawdbot
clawdbot → uses → Gemini batch API
clawdbot → uses → Discord.js
Mohamed → works_on → Comp-Core
Comp-Core → uses → Rust/Axum
Comp-Core → uses → Supabase PostgreSQL
Result: 7 structurally connected triples forming two 2-hop chains
Keyword (coincidence):
"Mohamed" appears in 20 triples (likes, wants_to, needs_to, ...)
"uses" appears in 50 triples (unrelated subjects)
Intersection: 20 results containing "Mohamed" — many irrelevant
Result: 20 results, some relevant by coincidence, no causal connectionThe relevance metric assigns both methods 1.00 because the expected terms appear. But the Graph Kernel's results constitute a knowledge chain — a directed path through verified relationships. The keyword results are a coincidence pile — documents that happen to contain matching substrings.
5.2 Latency Implications
Multi-hop queries currently require sequential HTTP round-trips:
2-hop query = 3 HTTP calls × ~200ms RTT = ~600ms
3-hop query = 4 HTTP calls × ~200ms RTT = ~800msWith a server-side traversal endpoint (`POST /api/knowledge/traverse`), this collapses to a single HTTP call:
2-hop query = 1 HTTP call + server-side SQL joins = ~220ms (remote) or ~15ms (local)This is the highest-priority improvement for multi-hop performance.
---
6. Entity Normalization Analysis
6.1 Current State
| Metric | Value |
|---|---|
| Raw unique subjects | 169 |
| After normalization | 132 |
| Identified duplicates | 123 triple pairs |
| Fragmentation rate | 22 |
6.2 Representative Alias Clusters
| Canonical Entity | Known Aliases |
|---|---|
| `dream-weaver-engine` | Dream Weaver, dream weaver, DreamWeaver, Dream-weaver-engine |
| `clawdbot` | Clawdbot, ClawdBot, clawdbot-gateway |
| `mohamed-diomande` | Mohamed Diomande, Mohameddiomande, mohameddiomande, Mohamed |
| `rag-plusplus` | RAG++, RAG++ (Cloud), rag-plusplus, rag-plusplus-core |
| `comp-core` | Comp-Core, CompCore, comp core |
6.3 Impact on Query Quality
Entity normalization failures directly suppress relevance:
- Relationship queries: 0.94 → 1.00 (projected) after normalization
- Predicate-specific: 0.80 → 0.95+ after normalization
- Overall average: 0.84 → 0.92+ after normalization
The implemented entity normalizer (`scripts/entity-normalizer.py`) uses a canonical alias table with fuzzy matching to resolve these at query time and ingestion time.
---
7. RAG++ Integration Architecture
7.1 The Hybrid Model
The Graph Kernel and RAG++ are complementary, not competing:
┌──────────────────────────────────────────────────────────────┐
│ HYBRID RETRIEVAL │
│ │
│ User Query: "What AI tools does the system use?" │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ RAG++ Path │ │ Graph Kernel │ │
│ │ (Semantic) │ │ (Structural) │ │
│ │ │ │ │ │
│ │ Vector embedding │ │ Subject: "comp- │ │
│ │ → similarity │ │ core" │ │
│ │ → top-K turns │ │ Predicate: "uses"│ │
│ │ │ │ → exact triples │ │
│ │ Finds: context │ │ → follow edges │ │
│ │ around AI tools │ │ → causal chains │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │
│ └────────────┬────────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Merge & Rank │ │
│ │ - Deduplicate │ │
│ │ - Cross-enrich │ │
│ │ - Rank by │ │
│ │ structure + │ │
│ │ similarity │ │
│ └──────────────────┘ │
│ │ │
│ ▼ │
│ Enriched Result Set │
│ (semantic + structural + provenance) │
└──────────────────────────────────────────────────────────────┘7.2 Integration Endpoints (Planned)
POST /api/enrich
{
"rag_results": [...], // From RAG++ /api/rag/search
"max_hops": 2,
"include_predicates": ["uses", "depends_on", "integrates_with"]
}
→ Returns enriched results with graph context:
- Original RAG++ result (semantic match)
- Entities found in result text
- Graph relationships for those entities (1-2 hops)
- Related entities discovered through traversalThis bridge transforms RAG++ from a flat similarity search into a structured reasoning system while preserving the Graph Kernel's provenance guarantees.
---
8. Corpus Statistics
8.1 Knowledge Graph State (Post-Topology Ingestion)
| Metric | Value |
|---|---|
| Total triples | 3,502 |
| Unique subjects | 221 |
| Unique predicates | 88 |
| Data sources | kimi-k2-extraction, topology-ingester, unknown |
| Average confidence | 0.73 (Kimi), 0.95 (topology) |
8.2 Top 10 Predicates
| Predicate | Count |
|---|---|
| `has_file` | 810 |
| `needs_to` | 467 |
| `has_path` | 383 |
| `should` | 332 |
| `likes` | 224 |
| `wants_to` | 178 |
| `is_a` | 111 |
| `uses` | 107 |
| `works_on` | 87 |
| `building` | 81 |
8.3 Predicate Taxonomy
The predicates reveal a natural clustering:
- Structural: `is_a`, `has_file`, `has_path`, `uses`, `depends_on`, `integrates_with` (40
- Intentional: `needs_to`, `should`, `wants_to`, `building` (34
- Relational: `works_on`, `likes`, `deployed_on` (12
- Descriptive: `full_name`, `code`, `port` (14
---
9. Roadmap
### Phase 1: SQLite Migration (Immediate)
- Goal: Reduce query latency from 291ms to <30ms
- Approach: Local SQLite cache with periodic Supabase sync
- Status: ✅ Implemented (`scripts/sqlite-mirror.py`, `scripts/gk-proxy.py` on port 8002)
- Impact: Latency competitive with BM25
### Phase 2: Entity Normalization (Immediate)
- Goal: Increase relevance from 0.84 to 0.95+
- Approach: Canonical alias table with fuzzy matching at query and ingestion time
- Status: ✅ Implemented (`scripts/entity-normalizer.py`)
- Impact: Eliminates alias-driven query failures
### Phase 3: Server-Side Traversal API (Short-term)
- Goal: Collapse multi-hop from N×200ms to single call
- Approach: `POST /api/knowledge/traverse` with server-side BFS
- Status: 📋 Planned
- Impact: Multi-hop latency drops from 600ms to ~220ms (remote) or ~15ms (local)
### Phase 4: RAG++ Integration Bridge (Medium-term)
- Goal: Combine structural reasoning with semantic search
- Approach: `/api/enrich` endpoint bridging GK and RAG++
- Status: 📋 Planned
- Impact: Addresses fuzzy/semantic weakness while preserving provenance
### Phase 5: Compass Visualization (Medium-term)
- Goal: Visual exploration of the knowledge graph
- Approach: Web-based graph visualization (D3/Cytoscape.js)
- Status: 📋 Planned
- Impact: Enables intuitive knowledge exploration
### Phase 6: Federated Graph (Long-term)
- Goal: Distributed knowledge across multiple agents
- Approach: Graph federation protocol with cross-kernel queries
- Status: 📋 Research phase
- Impact: Multi-agent knowledge sharing with provenance chains
---
10. Conclusions
The OpenClaw Graph Kernel occupies a unique position in the knowledge infrastructure landscape. It is not the fastest general-purpose graph database, nor the most expressive query engine, nor the most scalable distributed store. It is, however, the only system we evaluated that provides deterministic, policy-governed, cryptographically-signed context windows purpose-built for autonomous AI agent systems.
When to Use the Graph Kernel
| Use Case | Recommendation |
|---|---|
| Reproducible context windows | ✅ Use GK — its primary purpose |
| Auditable provenance chains | ✅ Use GK — HMAC-signed tokens |
| Multi-hop relationship reasoning | ✅ Use GK — structurally connected results |
| Dependency analysis (X → uses → Y) | ✅ Use GK — precise structural queries |
| Fuzzy/semantic search | ❌ Use RAG++ (vector similarity) |
| Speed-critical autocomplete | ❌ Use keyword/BM25 (in-memory, <10ms) |
| Billion-scale graph queries | ❌ Use Neo4j/Neptune/Dgraph |
| Standards-compliant RDF/SPARQL | ❌ Use Jena/Fuseki |
Final Assessment
| Criterion | Rating |
|---|---|
| As a general-purpose search engine | ❌ Not competitive |
| As a knowledge graph query engine | ⚠️ Adequate, improving |
| As a deterministic context slicer | ✅ Irreplaceable |
| As part of the CompCore stack | ✅ Essential |
| As a lightweight agent knowledge store | ✅ Excellent value/complexity ratio |
The Graph Kernel's value is not captured by standard information retrieval benchmarks. Its contribution is to the provenance infrastructure of autonomous agent systems: ensuring that every downstream decision can be traced to a specific, reproducible, verifiable context window. No general-purpose database provides this out of the box.
---
Appendix A: Test Environment
| Component | Specification |
|---|---|
| Machine | MacBook Air (Apple Silicon M3, arm64) |
| OS | Darwin 24.6.0 |
| Graph Kernel | Rust binary, cc-graph-kernel v0.1.0 |
| RAG++ | Python (FastAPI), v0.1.0 |
| Database | Supabase PostgreSQL (remote, us-east-1) |
| Network | Home broadband (~200ms RTT to Supabase) |
| Benchmark Script | `benchmarks/run_benchmark.py` |
| Raw Results | `/tmp/benchmark_results.json` |
Appendix B: Graph Kernel API Reference (Summary)
| Endpoint | Method | Purpose |
|---|---|---|
| `POST /api/slice` | POST | Generate a context slice around an anchor turn |
| `POST /api/slice/batch` | POST | Generate multiple slices in batch |
| `POST /api/verify_token` | POST | Verify an admissibility token |
| `GET /api/policies` | GET | List registered slice policies |
| `POST /api/policies` | POST | Register a new slice policy |
| `GET /api/knowledge` | GET | Query knowledge triples |
| `POST /api/knowledge` | POST | Add a single knowledge triple |
| `POST /api/knowledge/batch` | POST | Add triples in batch |
| `GET /api/knowledge/stats` | GET | Get knowledge graph statistics |
| `GET /health` | GET | Detailed health check |
| `GET /health/live` | GET | Liveness probe |
| `GET /health/ready` | GET | Readiness probe |
| `GET /health/startup` | GET | Startup probe |
---
Report generated 2026-02-13. Graph Kernel commit: HEAD of cc-graph-kernel, schema v1.0.0.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/docs/GRAPH-KERNEL-EVALUATION-REPORT.md
Detected Structure
Method · Evaluation · References · Math · Code Anchors · Architecture