Grand Diomande Research · Full HTML Reader

Graph + Vector Intelligence (Mega-Cube 17) -- Stage 1, Path A: The Edge Table

Add a single `memory_edges` table to the existing Postgres schema. Edges connect `memory_turns` rows by relationship type: temporal (same session), referential (shared file), causal (correction/follow-up), and categorical (shared inscription). Vector search stays exactly as-is. Graph queries are standard SQL joins. No new services, no new databases, no new dependencies. The graph is a Postgres table.

Agents That Account for Themselves research note experiment writeup candidate score 22 .md

Full Public Reader

# Graph + Vector Intelligence (Mega-Cube 17) -- Stage 1, Path A: The Edge Table
Run: mega-cube-17-graph-vector-intel
Generated: 2026-04-04
Method: Evolution-Cube Stage 1 -- Divergent exploration (relational graph in Postgres)
Run Directory: Desktop/evo-cube-output/mega-cube-17-graph-vector-intel/

---

Core Idea

---

1. Schema Design

New Table: `memory_edges`

sql

CREATE TABLE memory_edges (
    id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    source_id TEXT NOT NULL REFERENCES memory_turns(id),
    target_id TEXT NOT NULL REFERENCES memory_turns(id),
    edge_type TEXT NOT NULL,        -- 'temporal', 'referential', 'causal', 'categorical'
    weight FLOAT DEFAULT 1.0,       -- strength of connection (KARL reward, similarity, etc.)
    metadata JSONB DEFAULT '{}',    -- edge-specific data (file path, inscription type, etc.)
    created_at TIMESTAMPTZ DEFAULT now(),

    CONSTRAINT no_self_loops CHECK (source_id != target_id)
);

CREATE INDEX idx_edges_source ON memory_edges(source_id);
CREATE INDEX idx_edges_target ON memory_edges(target_id);
CREATE INDEX idx_edges_type ON memory_edges(edge_type);
CREATE INDEX idx_edges_source_type ON memory_edges(source_id, edge_type);

Edge Types

Type	Source	Target	Weight	Metadata
`temporal`	Turn N in session	Turn N+1 in session	1.0 (decays with gap)	`{session_id, gap_seconds}`
`referential`	Turn that reads file X	Turn that wrote file X	cosine(embed_A, embed_B)	`{file_path, operation}`
`causal`	Correction turn	Original turn	KARL correction score	`{correction_type, overlap}`
`categorical`	Turn with inscription A	Turn with same inscription	inscription similarity	`{inscription, category}`
`trajectory`	Trajectory start	Trajectory end	KARL composite reward	`{skill, domain, reward}`

Estimated Scale

332K turns, avg 5 turns/session = ~66K sessions
Temporal edges: ~266K (N-1 per session of N turns)
Referential edges: ~50K (file path overlap across sessions)
Causal edges: ~5K (KARL corrections + redo detections)
Categorical edges: ~20K (inscription-matched pairs, sampled)
Trajectory edges: ~121 (KARL trajectory start-to-end)
Total: ~341K edges. Fits in ~50MB with indexes.

---

2. Edge Construction Pipeline

Phase 1: Temporal Edges (Automated, ~2 hours)

python

# Extract session boundaries from memory_turns
# Group by session_id, order by created_at
# Connect consecutive turns within each session
def build_temporal_edges(conn):
    query = """
    WITH ordered_turns AS (
        SELECT id, session_id, created_at,
               LAG(id) OVER (PARTITION BY session_id ORDER BY created_at) as prev_id,
               LAG(created_at) OVER (PARTITION BY session_id ORDER BY created_at) as prev_ts
        FROM memory_turns
        WHERE session_id IS NOT NULL
    )
    INSERT INTO memory_edges (source_id, target_id, edge_type, weight, metadata)
    SELECT prev_id, id, 'temporal',
           1.0 / (1.0 + EXTRACT(EPOCH FROM (created_at - prev_ts)) / 3600.0),
           jsonb_build_object('session_id', session_id, 'gap_seconds',
                              EXTRACT(EPOCH FROM (created_at - prev_ts)))
    FROM ordered_turns
    WHERE prev_id IS NOT NULL
    """
    conn.execute(query)

Weight decays with temporal gap: 1.0 for consecutive turns, ~0.5 for 1-hour gap, ~0.1 for 10-hour gap.

Phase 2: Referential Edges (Batch, ~4 hours)

Scan `memory_turns.metadata` JSONB for file paths mentioned in tool calls. Cross-reference turns that share file paths. Weight by embedding similarity between the two turns.

python

def build_referential_edges(conn):
    # Extract file paths from metadata JSONB
    # For each file, find all turns that mention it
    # Create edges between turns that share a file
    # Weight = cosine similarity of their embeddings
    query = """
    WITH file_mentions AS (
        SELECT id, embedding,
               jsonb_array_elements_text(metadata->'files') as file_path
        FROM memory_turns
        WHERE metadata ? 'files'
    )
    SELECT a.id as source, b.id as target,
           1.0 - (a.embedding <=> b.embedding) as similarity,
           a.file_path
    FROM file_mentions a
    JOIN file_mentions b ON a.file_path = b.file_path
         AND a.id < b.id  -- avoid duplicates
    WHERE 1.0 - (a.embedding <=> b.embedding) > 0.3  -- minimum similarity threshold
    """

Phase 3: Causal Edges (From KARL, ~30 minutes)

Parse `[home-path]`. For each trajectory with corrections or redo detections, create causal edges.

python

def build_causal_edges(karl_path, conn):
    for traj in load_jsonl(karl_path):
        if traj.get('correction_of'):
            insert_edge(
                source=traj['correction_of'],
                target=traj['session_id'],
                edge_type='causal',
                weight=traj.get('reward', 0.5),
                metadata={'correction_type': 'redo', 'skill': traj.get('skill')}
            )

Phase 4: Categorical Edges (From Inscriptions, ~1 hour)

Group turns by inscription category. Within each category, connect high-similarity pairs.

Phase 5: Trajectory Edges (From KARL, ~10 minutes)

One edge per trajectory: first turn to last turn, weighted by composite reward.

---

3. Query Patterns

Pattern 1: Vector + Graph Rerank

The most common pattern. Do vector search first (fast, existing path), then rerank results by graph connectivity to the query context.

python

def graph_reranked_search(query_text, k=10, graph_boost=0.3):
    # Step 1: Standard vector search (top 3*k candidates)
    candidates = vector_search(query_text, limit=k*3)

    # Step 2: For each candidate, compute graph score
    for c in candidates:
        # How many edges connect this candidate to other candidates?
        graph_score = count_edges_to_set(c.id, [x.id for x in candidates])
        # Boost candidates that are well-connected in the result set
        c.final_score = (1 - graph_boost) * c.similarity + graph_boost * graph_score

    # Step 3: Return top-k by final score
    return sorted(candidates, key=lambda x: x.final_score, reverse=True)[:k]

Pattern 2: Multi-Hop Traversal

Start from a turn, follow edges N hops, collect all reachable turns, rank by embedding similarity to query.

sql

WITH RECURSIVE reachable AS (
    -- Seed: start from a specific turn
    SELECT target_id as id, 1 as depth, weight
    FROM memory_edges
    WHERE source_id = $start_turn AND edge_type = ANY($edge_types)

    UNION ALL

    -- Recurse: follow edges up to N hops
    SELECT e.target_id, r.depth + 1, r.weight * e.weight
    FROM reachable r
    JOIN memory_edges e ON e.source_id = r.id
    WHERE r.depth < $max_hops
)
SELECT DISTINCT mt.id, mt.content, r.weight as graph_weight,
       1.0 - (mt.embedding <=> $query_embedding) as vector_sim
FROM reachable r
JOIN memory_turns mt ON mt.id = r.id
ORDER BY (r.weight * 0.4 + (1.0 - (mt.embedding <=> $query_embedding)) * 0.6) DESC
LIMIT $k;

Pattern 3: Session Reconstruction

Given any turn, reconstruct its full session context by following temporal edges.

sql

WITH RECURSIVE session_chain AS (
    SELECT id, content, created_at, 0 as position
    FROM memory_turns WHERE id = $seed_turn

    UNION ALL

    -- Follow temporal edges backward
    SELECT mt.id, mt.content, mt.created_at, sc.position - 1
    FROM session_chain sc
    JOIN memory_edges e ON e.target_id = sc.id AND e.edge_type = 'temporal'
    JOIN memory_turns mt ON mt.id = e.source_id
    WHERE sc.position > -20  -- max 20 turns back
)
SELECT * FROM session_chain ORDER BY position;

Pattern 4: Inscription-Filtered Search

Filter vector search by inscription category before similarity ranking.

python

def inscription_search(query_text, inscription_category, k=10):
    # Use categorical edges to find turns in the target inscription cluster
    cluster_ids = get_cluster_members(inscription_category, limit=1000)
    # Vector search within that cluster only
    return vector_search_within(query_text, cluster_ids, limit=k)

---

4. Integration Points

MCP rag_search Enhancement

Modify `[home-path]`:

python

# Existing: mac3_pgvector_search() -> race all nodes
# New: add graph_rerank parameter

@mcp.tool()
async def rag_search(query: str, graph_rerank: bool = True,
                     max_hops: int = 0, inscription: str = None, ...):
    # Phase 1: Vector search (existing)
    results = await mac3_pgvector_search(query, limit=k*3 if graph_rerank else k)

    # Phase 2: Graph rerank (new)
    if graph_rerank and results:
        results = await graph_rerank_results(results, query_embedding)

    # Phase 3: Multi-hop expansion (new, optional)
    if max_hops > 0 and results:
        results = await expand_via_graph(results[0].id, max_hops, query_embedding)

    return results

RAG++ Container Update

Add graph query endpoints to the RAG++ FastAPI app:

python

@app.get("/api/graph/neighbors/{turn_id}")
async def get_neighbors(turn_id: str, edge_type: str = None, max_hops: int = 1):
    ...

@app.get("/api/graph/path/{source_id}/{target_id}")
async def find_path(source_id: str, target_id: str, max_hops: int = 5):
    ...

@app.post("/api/graph/search")
async def graph_augmented_search(query: str, graph_boost: float = 0.3, ...):
    ...

KARL Integration

When KARL flushes a trajectory (`trajectory_tap.py` Tap C), also create edges:

python

def flush_session_with_edges(session_data, conn):
    # Existing: write to trajectories.jsonl
    flush_session(session_data)

    # New: create trajectory edge
    if session_data.get('turns'):
        first_turn = session_data['turns'][0]['id']
        last_turn = session_data['turns'][-1]['id']
        create_edge(first_turn, last_turn, 'trajectory',
                    weight=session_data['reward'],
                    metadata={'skill': session_data.get('skill'),
                              'domain': session_data.get('domain')})

---

5. Cost Analysis

Item	Compute	Time	Dollar Cost
Temporal edge construction	Supabase SQL	~2 hours	$0 (Supabase free tier)
Referential edge construction	Supabase SQL + embeddings	~4 hours	$0 (embeddings already exist)
Causal edges from KARL	Python script	~30 min	$0
Categorical edges	Supabase SQL	~1 hour	$0
`memory_edges` table storage	~50MB	n/a	$0 (within free tier)
Indexes	~20MB	~10 min	$0
RAG++ container rebuild	Docker build	~5 min	$0
MCP server update	Code edit	~1 hour	$0

Total: ~8 hours of compute time, $0 incremental cost.

---

6. Strengths

Zero new services. Everything lives in existing Postgres.
Incremental. Vector search works exactly as before. Graph is an optional overlay.
Standard SQL. No graph query language to learn. Recursive CTEs handle multi-hop.
The edge table is append-only. New edges never invalidate old ones.
Mesh pgvector nodes can each hold a copy of the edges table (same replication model as turns).

7. Weaknesses

Recursive CTEs in Postgres get slow past 4-5 hops on large graphs. 341K edges with 5-hop recursion could hit seconds.
Edge construction is batch, not real-time. New turns don't get edges until the next batch run.
Referential edges depend on metadata quality. If `metadata.files` is incomplete, edges are missing.
No native graph algorithms (PageRank, community detection, shortest path). Would need application-level implementation or pg_graphql extension.
IVFFlat + edge joins in the same query can confuse the Postgres query planner. May need to separate vector search and graph traversal into two round-trips.

---

8. What This Path Uniquely Enables

The Edge Table is the minimum viable graph. It answers the question: what is the smallest change that gives us graph-augmented retrieval? One table, five indexes, a batch construction script, and a rerank function. If this proves valuable, Paths B-F offer ways to make it faster, smarter, and more autonomous. But this path ships first.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

evo-cube-output/mega-cube-17-graph-vector-intel/stage1-path-a.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture · is Stage Research