Graph + Vector Intelligence (Mega-Cube 17) -- Stage 1, Path A: The Edge Table
Add a single `memory_edges` table to the existing Postgres schema. Edges connect `memory_turns` rows by relationship type: temporal (same session), referential (shared file), causal (correction/follow-up), and categorical (shared inscription). Vector search stays exactly as-is. Graph queries are standard SQL joins. No new services, no new databases, no new dependencies. The graph is a Postgres table.
Full Public Reader
# Graph + Vector Intelligence (Mega-Cube 17) -- Stage 1, Path A: The Edge Table
Run: mega-cube-17-graph-vector-intel
Generated: 2026-04-04
Method: Evolution-Cube Stage 1 -- Divergent exploration (relational graph in Postgres)
Run Directory: Desktop/evo-cube-output/mega-cube-17-graph-vector-intel/
---
Core Idea
Add a single `memory_edges` table to the existing Postgres schema. Edges connect `memory_turns` rows by relationship type: temporal (same session), referential (shared file), causal (correction/follow-up), and categorical (shared inscription). Vector search stays exactly as-is. Graph queries are standard SQL joins. No new services, no new databases, no new dependencies. The graph is a Postgres table.
---
1. Schema Design
New Table: `memory_edges`
CREATE TABLE memory_edges (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
source_id TEXT NOT NULL REFERENCES memory_turns(id),
target_id TEXT NOT NULL REFERENCES memory_turns(id),
edge_type TEXT NOT NULL, -- 'temporal', 'referential', 'causal', 'categorical'
weight FLOAT DEFAULT 1.0, -- strength of connection (KARL reward, similarity, etc.)
metadata JSONB DEFAULT '{}', -- edge-specific data (file path, inscription type, etc.)
created_at TIMESTAMPTZ DEFAULT now(),
CONSTRAINT no_self_loops CHECK (source_id != target_id)
);
CREATE INDEX idx_edges_source ON memory_edges(source_id);
CREATE INDEX idx_edges_target ON memory_edges(target_id);
CREATE INDEX idx_edges_type ON memory_edges(edge_type);
CREATE INDEX idx_edges_source_type ON memory_edges(source_id, edge_type);Edge Types
| Type | Source | Target | Weight | Metadata |
|---|---|---|---|---|
| `temporal` | Turn N in session | Turn N+1 in session | 1.0 (decays with gap) | `{session_id, gap_seconds}` |
| `referential` | Turn that reads file X | Turn that wrote file X | cosine(embed_A, embed_B) | `{file_path, operation}` |
| `causal` | Correction turn | Original turn | KARL correction score | `{correction_type, overlap}` |
| `categorical` | Turn with inscription A | Turn with same inscription | inscription similarity | `{inscription, category}` |
| `trajectory` | Trajectory start | Trajectory end | KARL composite reward | `{skill, domain, reward}` |
Estimated Scale
- 332K turns, avg 5 turns/session = ~66K sessions
- Temporal edges: ~266K (N-1 per session of N turns)
- Referential edges: ~50K (file path overlap across sessions)
- Causal edges: ~5K (KARL corrections + redo detections)
- Categorical edges: ~20K (inscription-matched pairs, sampled)
- Trajectory edges: ~121 (KARL trajectory start-to-end)
- Total: ~341K edges. Fits in ~50MB with indexes.
---
2. Edge Construction Pipeline
Phase 1: Temporal Edges (Automated, ~2 hours)
# Extract session boundaries from memory_turns
# Group by session_id, order by created_at
# Connect consecutive turns within each session
def build_temporal_edges(conn):
query = """
WITH ordered_turns AS (
SELECT id, session_id, created_at,
LAG(id) OVER (PARTITION BY session_id ORDER BY created_at) as prev_id,
LAG(created_at) OVER (PARTITION BY session_id ORDER BY created_at) as prev_ts
FROM memory_turns
WHERE session_id IS NOT NULL
)
INSERT INTO memory_edges (source_id, target_id, edge_type, weight, metadata)
SELECT prev_id, id, 'temporal',
1.0 / (1.0 + EXTRACT(EPOCH FROM (created_at - prev_ts)) / 3600.0),
jsonb_build_object('session_id', session_id, 'gap_seconds',
EXTRACT(EPOCH FROM (created_at - prev_ts)))
FROM ordered_turns
WHERE prev_id IS NOT NULL
"""
conn.execute(query)Weight decays with temporal gap: 1.0 for consecutive turns, ~0.5 for 1-hour gap, ~0.1 for 10-hour gap.
Phase 2: Referential Edges (Batch, ~4 hours)
Scan `memory_turns.metadata` JSONB for file paths mentioned in tool calls. Cross-reference turns that share file paths. Weight by embedding similarity between the two turns.
def build_referential_edges(conn):
# Extract file paths from metadata JSONB
# For each file, find all turns that mention it
# Create edges between turns that share a file
# Weight = cosine similarity of their embeddings
query = """
WITH file_mentions AS (
SELECT id, embedding,
jsonb_array_elements_text(metadata->'files') as file_path
FROM memory_turns
WHERE metadata ? 'files'
)
SELECT a.id as source, b.id as target,
1.0 - (a.embedding <=> b.embedding) as similarity,
a.file_path
FROM file_mentions a
JOIN file_mentions b ON a.file_path = b.file_path
AND a.id < b.id -- avoid duplicates
WHERE 1.0 - (a.embedding <=> b.embedding) > 0.3 -- minimum similarity threshold
"""Phase 3: Causal Edges (From KARL, ~30 minutes)
Parse `[home-path]`. For each trajectory with corrections or redo detections, create causal edges.
def build_causal_edges(karl_path, conn):
for traj in load_jsonl(karl_path):
if traj.get('correction_of'):
insert_edge(
source=traj['correction_of'],
target=traj['session_id'],
edge_type='causal',
weight=traj.get('reward', 0.5),
metadata={'correction_type': 'redo', 'skill': traj.get('skill')}
)Phase 4: Categorical Edges (From Inscriptions, ~1 hour)
Group turns by inscription category. Within each category, connect high-similarity pairs.
Phase 5: Trajectory Edges (From KARL, ~10 minutes)
One edge per trajectory: first turn to last turn, weighted by composite reward.
---
3. Query Patterns
Pattern 1: Vector + Graph Rerank
The most common pattern. Do vector search first (fast, existing path), then rerank results by graph connectivity to the query context.
def graph_reranked_search(query_text, k=10, graph_boost=0.3):
# Step 1: Standard vector search (top 3*k candidates)
candidates = vector_search(query_text, limit=k*3)
# Step 2: For each candidate, compute graph score
for c in candidates:
# How many edges connect this candidate to other candidates?
graph_score = count_edges_to_set(c.id, [x.id for x in candidates])
# Boost candidates that are well-connected in the result set
c.final_score = (1 - graph_boost) * c.similarity + graph_boost * graph_score
# Step 3: Return top-k by final score
return sorted(candidates, key=lambda x: x.final_score, reverse=True)[:k]Pattern 2: Multi-Hop Traversal
Start from a turn, follow edges N hops, collect all reachable turns, rank by embedding similarity to query.
WITH RECURSIVE reachable AS (
-- Seed: start from a specific turn
SELECT target_id as id, 1 as depth, weight
FROM memory_edges
WHERE source_id = $start_turn AND edge_type = ANY($edge_types)
UNION ALL
-- Recurse: follow edges up to N hops
SELECT e.target_id, r.depth + 1, r.weight * e.weight
FROM reachable r
JOIN memory_edges e ON e.source_id = r.id
WHERE r.depth < $max_hops
)
SELECT DISTINCT mt.id, mt.content, r.weight as graph_weight,
1.0 - (mt.embedding <=> $query_embedding) as vector_sim
FROM reachable r
JOIN memory_turns mt ON mt.id = r.id
ORDER BY (r.weight * 0.4 + (1.0 - (mt.embedding <=> $query_embedding)) * 0.6) DESC
LIMIT $k;Pattern 3: Session Reconstruction
Given any turn, reconstruct its full session context by following temporal edges.
WITH RECURSIVE session_chain AS (
SELECT id, content, created_at, 0 as position
FROM memory_turns WHERE id = $seed_turn
UNION ALL
-- Follow temporal edges backward
SELECT mt.id, mt.content, mt.created_at, sc.position - 1
FROM session_chain sc
JOIN memory_edges e ON e.target_id = sc.id AND e.edge_type = 'temporal'
JOIN memory_turns mt ON mt.id = e.source_id
WHERE sc.position > -20 -- max 20 turns back
)
SELECT * FROM session_chain ORDER BY position;Pattern 4: Inscription-Filtered Search
Filter vector search by inscription category before similarity ranking.
def inscription_search(query_text, inscription_category, k=10):
# Use categorical edges to find turns in the target inscription cluster
cluster_ids = get_cluster_members(inscription_category, limit=1000)
# Vector search within that cluster only
return vector_search_within(query_text, cluster_ids, limit=k)---
4. Integration Points
MCP rag_search Enhancement
Modify `[home-path]`:
# Existing: mac3_pgvector_search() -> race all nodes
# New: add graph_rerank parameter
@mcp.tool()
async def rag_search(query: str, graph_rerank: bool = True,
max_hops: int = 0, inscription: str = None, ...):
# Phase 1: Vector search (existing)
results = await mac3_pgvector_search(query, limit=k*3 if graph_rerank else k)
# Phase 2: Graph rerank (new)
if graph_rerank and results:
results = await graph_rerank_results(results, query_embedding)
# Phase 3: Multi-hop expansion (new, optional)
if max_hops > 0 and results:
results = await expand_via_graph(results[0].id, max_hops, query_embedding)
return resultsRAG++ Container Update
Add graph query endpoints to the RAG++ FastAPI app:
@app.get("/api/graph/neighbors/{turn_id}")
async def get_neighbors(turn_id: str, edge_type: str = None, max_hops: int = 1):
...
@app.get("/api/graph/path/{source_id}/{target_id}")
async def find_path(source_id: str, target_id: str, max_hops: int = 5):
...
@app.post("/api/graph/search")
async def graph_augmented_search(query: str, graph_boost: float = 0.3, ...):
...KARL Integration
When KARL flushes a trajectory (`trajectory_tap.py` Tap C), also create edges:
def flush_session_with_edges(session_data, conn):
# Existing: write to trajectories.jsonl
flush_session(session_data)
# New: create trajectory edge
if session_data.get('turns'):
first_turn = session_data['turns'][0]['id']
last_turn = session_data['turns'][-1]['id']
create_edge(first_turn, last_turn, 'trajectory',
weight=session_data['reward'],
metadata={'skill': session_data.get('skill'),
'domain': session_data.get('domain')})---
5. Cost Analysis
| Item | Compute | Time | Dollar Cost |
|---|---|---|---|
| Temporal edge construction | Supabase SQL | ~2 hours | $0 (Supabase free tier) |
| Referential edge construction | Supabase SQL + embeddings | ~4 hours | $0 (embeddings already exist) |
| Causal edges from KARL | Python script | ~30 min | $0 |
| Categorical edges | Supabase SQL | ~1 hour | $0 |
| `memory_edges` table storage | ~50MB | n/a | $0 (within free tier) |
| Indexes | ~20MB | ~10 min | $0 |
| RAG++ container rebuild | Docker build | ~5 min | $0 |
| MCP server update | Code edit | ~1 hour | $0 |
Total: ~8 hours of compute time, $0 incremental cost.
---
6. Strengths
- Zero new services. Everything lives in existing Postgres.
- Incremental. Vector search works exactly as before. Graph is an optional overlay.
- Standard SQL. No graph query language to learn. Recursive CTEs handle multi-hop.
- The edge table is append-only. New edges never invalidate old ones.
- Mesh pgvector nodes can each hold a copy of the edges table (same replication model as turns).
7. Weaknesses
- Recursive CTEs in Postgres get slow past 4-5 hops on large graphs. 341K edges with 5-hop recursion could hit seconds.
- Edge construction is batch, not real-time. New turns don't get edges until the next batch run.
- Referential edges depend on metadata quality. If `metadata.files` is incomplete, edges are missing.
- No native graph algorithms (PageRank, community detection, shortest path). Would need application-level implementation or pg_graphql extension.
- IVFFlat + edge joins in the same query can confuse the Postgres query planner. May need to separate vector search and graph traversal into two round-trips.
---
8. What This Path Uniquely Enables
The Edge Table is the minimum viable graph. It answers the question: what is the smallest change that gives us graph-augmented retrieval? One table, five indexes, a batch construction script, and a rerank function. If this proves valuable, Paths B-F offer ways to make it faster, smarter, and more autonomous. But this path ships first.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
evo-cube-output/mega-cube-17-graph-vector-intel/stage1-path-a.md
Detected Structure
Method · Evaluation · References · Code Anchors · Architecture · is Stage Research