Grand Diomande Research · Full HTML Reader

Graph + Vector Intelligence (Mega-Cube 17) -- Stage 1, Path A: The Edge Table

Add a single `memory_edges` table to the existing Postgres schema. Edges connect `memory_turns` rows by relationship type: temporal (same session), referential (shared file), causal (correction/follow-up), and categorical (shared inscription). Vector search stays exactly as-is. Graph queries are standard SQL joins. No new services, no new databases, no new dependencies. The graph is a Postgres table.

Agents That Account for Themselves research note experiment writeup candidate score 22 .md

Full Public Reader

# Graph + Vector Intelligence (Mega-Cube 17) -- Stage 1, Path A: The Edge Table
Run: mega-cube-17-graph-vector-intel
Generated: 2026-04-04
Method: Evolution-Cube Stage 1 -- Divergent exploration (relational graph in Postgres)
Run Directory: Desktop/evo-cube-output/mega-cube-17-graph-vector-intel/

---

Core Idea

Add a single `memory_edges` table to the existing Postgres schema. Edges connect `memory_turns` rows by relationship type: temporal (same session), referential (shared file), causal (correction/follow-up), and categorical (shared inscription). Vector search stays exactly as-is. Graph queries are standard SQL joins. No new services, no new databases, no new dependencies. The graph is a Postgres table.

---

1. Schema Design

New Table: `memory_edges`

sql
CREATE TABLE memory_edges (
    id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    source_id TEXT NOT NULL REFERENCES memory_turns(id),
    target_id TEXT NOT NULL REFERENCES memory_turns(id),
    edge_type TEXT NOT NULL,        -- 'temporal', 'referential', 'causal', 'categorical'
    weight FLOAT DEFAULT 1.0,       -- strength of connection (KARL reward, similarity, etc.)
    metadata JSONB DEFAULT '{}',    -- edge-specific data (file path, inscription type, etc.)
    created_at TIMESTAMPTZ DEFAULT now(),

    CONSTRAINT no_self_loops CHECK (source_id != target_id)
);

CREATE INDEX idx_edges_source ON memory_edges(source_id);
CREATE INDEX idx_edges_target ON memory_edges(target_id);
CREATE INDEX idx_edges_type ON memory_edges(edge_type);
CREATE INDEX idx_edges_source_type ON memory_edges(source_id, edge_type);

Edge Types

TypeSourceTargetWeightMetadata
`temporal`Turn N in sessionTurn N+1 in session1.0 (decays with gap)`{session_id, gap_seconds}`
`referential`Turn that reads file XTurn that wrote file Xcosine(embed_A, embed_B)`{file_path, operation}`
`causal`Correction turnOriginal turnKARL correction score`{correction_type, overlap}`
`categorical`Turn with inscription ATurn with same inscriptioninscription similarity`{inscription, category}`
`trajectory`Trajectory startTrajectory endKARL composite reward`{skill, domain, reward}`

Estimated Scale

  • 332K turns, avg 5 turns/session = ~66K sessions
  • Temporal edges: ~266K (N-1 per session of N turns)
  • Referential edges: ~50K (file path overlap across sessions)
  • Causal edges: ~5K (KARL corrections + redo detections)
  • Categorical edges: ~20K (inscription-matched pairs, sampled)
  • Trajectory edges: ~121 (KARL trajectory start-to-end)
  • Total: ~341K edges. Fits in ~50MB with indexes.

---

2. Edge Construction Pipeline

Phase 1: Temporal Edges (Automated, ~2 hours)

python
# Extract session boundaries from memory_turns
# Group by session_id, order by created_at
# Connect consecutive turns within each session
def build_temporal_edges(conn):
    query = """
    WITH ordered_turns AS (
        SELECT id, session_id, created_at,
               LAG(id) OVER (PARTITION BY session_id ORDER BY created_at) as prev_id,
               LAG(created_at) OVER (PARTITION BY session_id ORDER BY created_at) as prev_ts
        FROM memory_turns
        WHERE session_id IS NOT NULL
    )
    INSERT INTO memory_edges (source_id, target_id, edge_type, weight, metadata)
    SELECT prev_id, id, 'temporal',
           1.0 / (1.0 + EXTRACT(EPOCH FROM (created_at - prev_ts)) / 3600.0),
           jsonb_build_object('session_id', session_id, 'gap_seconds',
                              EXTRACT(EPOCH FROM (created_at - prev_ts)))
    FROM ordered_turns
    WHERE prev_id IS NOT NULL
    """
    conn.execute(query)

Weight decays with temporal gap: 1.0 for consecutive turns, ~0.5 for 1-hour gap, ~0.1 for 10-hour gap.

Phase 2: Referential Edges (Batch, ~4 hours)

Scan `memory_turns.metadata` JSONB for file paths mentioned in tool calls. Cross-reference turns that share file paths. Weight by embedding similarity between the two turns.

python
def build_referential_edges(conn):
    # Extract file paths from metadata JSONB
    # For each file, find all turns that mention it
    # Create edges between turns that share a file
    # Weight = cosine similarity of their embeddings
    query = """
    WITH file_mentions AS (
        SELECT id, embedding,
               jsonb_array_elements_text(metadata->'files') as file_path
        FROM memory_turns
        WHERE metadata ? 'files'
    )
    SELECT a.id as source, b.id as target,
           1.0 - (a.embedding <=> b.embedding) as similarity,
           a.file_path
    FROM file_mentions a
    JOIN file_mentions b ON a.file_path = b.file_path
         AND a.id < b.id  -- avoid duplicates
    WHERE 1.0 - (a.embedding <=> b.embedding) > 0.3  -- minimum similarity threshold
    """

Phase 3: Causal Edges (From KARL, ~30 minutes)

Parse `[home-path]`. For each trajectory with corrections or redo detections, create causal edges.

python
def build_causal_edges(karl_path, conn):
    for traj in load_jsonl(karl_path):
        if traj.get('correction_of'):
            insert_edge(
                source=traj['correction_of'],
                target=traj['session_id'],
                edge_type='causal',
                weight=traj.get('reward', 0.5),
                metadata={'correction_type': 'redo', 'skill': traj.get('skill')}
            )

Phase 4: Categorical Edges (From Inscriptions, ~1 hour)

Group turns by inscription category. Within each category, connect high-similarity pairs.

Phase 5: Trajectory Edges (From KARL, ~10 minutes)

One edge per trajectory: first turn to last turn, weighted by composite reward.

---

3. Query Patterns

Pattern 1: Vector + Graph Rerank

The most common pattern. Do vector search first (fast, existing path), then rerank results by graph connectivity to the query context.

python
def graph_reranked_search(query_text, k=10, graph_boost=0.3):
    # Step 1: Standard vector search (top 3*k candidates)
    candidates = vector_search(query_text, limit=k*3)

    # Step 2: For each candidate, compute graph score
    for c in candidates:
        # How many edges connect this candidate to other candidates?
        graph_score = count_edges_to_set(c.id, [x.id for x in candidates])
        # Boost candidates that are well-connected in the result set
        c.final_score = (1 - graph_boost) * c.similarity + graph_boost * graph_score

    # Step 3: Return top-k by final score
    return sorted(candidates, key=lambda x: x.final_score, reverse=True)[:k]

Pattern 2: Multi-Hop Traversal

Start from a turn, follow edges N hops, collect all reachable turns, rank by embedding similarity to query.

sql
WITH RECURSIVE reachable AS (
    -- Seed: start from a specific turn
    SELECT target_id as id, 1 as depth, weight
    FROM memory_edges
    WHERE source_id = $start_turn AND edge_type = ANY($edge_types)

    UNION ALL

    -- Recurse: follow edges up to N hops
    SELECT e.target_id, r.depth + 1, r.weight * e.weight
    FROM reachable r
    JOIN memory_edges e ON e.source_id = r.id
    WHERE r.depth < $max_hops
)
SELECT DISTINCT mt.id, mt.content, r.weight as graph_weight,
       1.0 - (mt.embedding <=> $query_embedding) as vector_sim
FROM reachable r
JOIN memory_turns mt ON mt.id = r.id
ORDER BY (r.weight * 0.4 + (1.0 - (mt.embedding <=> $query_embedding)) * 0.6) DESC
LIMIT $k;

Pattern 3: Session Reconstruction

Given any turn, reconstruct its full session context by following temporal edges.

sql
WITH RECURSIVE session_chain AS (
    SELECT id, content, created_at, 0 as position
    FROM memory_turns WHERE id = $seed_turn

    UNION ALL

    -- Follow temporal edges backward
    SELECT mt.id, mt.content, mt.created_at, sc.position - 1
    FROM session_chain sc
    JOIN memory_edges e ON e.target_id = sc.id AND e.edge_type = 'temporal'
    JOIN memory_turns mt ON mt.id = e.source_id
    WHERE sc.position > -20  -- max 20 turns back
)
SELECT * FROM session_chain ORDER BY position;

Pattern 4: Inscription-Filtered Search

Filter vector search by inscription category before similarity ranking.

python
def inscription_search(query_text, inscription_category, k=10):
    # Use categorical edges to find turns in the target inscription cluster
    cluster_ids = get_cluster_members(inscription_category, limit=1000)
    # Vector search within that cluster only
    return vector_search_within(query_text, cluster_ids, limit=k)

---

4. Integration Points

MCP rag_search Enhancement

Modify `[home-path]`:

python
# Existing: mac3_pgvector_search() -> race all nodes
# New: add graph_rerank parameter

@mcp.tool()
async def rag_search(query: str, graph_rerank: bool = True,
                     max_hops: int = 0, inscription: str = None, ...):
    # Phase 1: Vector search (existing)
    results = await mac3_pgvector_search(query, limit=k*3 if graph_rerank else k)

    # Phase 2: Graph rerank (new)
    if graph_rerank and results:
        results = await graph_rerank_results(results, query_embedding)

    # Phase 3: Multi-hop expansion (new, optional)
    if max_hops > 0 and results:
        results = await expand_via_graph(results[0].id, max_hops, query_embedding)

    return results

RAG++ Container Update

Add graph query endpoints to the RAG++ FastAPI app:

python
@app.get("/api/graph/neighbors/{turn_id}")
async def get_neighbors(turn_id: str, edge_type: str = None, max_hops: int = 1):
    ...

@app.get("/api/graph/path/{source_id}/{target_id}")
async def find_path(source_id: str, target_id: str, max_hops: int = 5):
    ...

@app.post("/api/graph/search")
async def graph_augmented_search(query: str, graph_boost: float = 0.3, ...):
    ...

KARL Integration

When KARL flushes a trajectory (`trajectory_tap.py` Tap C), also create edges:

python
def flush_session_with_edges(session_data, conn):
    # Existing: write to trajectories.jsonl
    flush_session(session_data)

    # New: create trajectory edge
    if session_data.get('turns'):
        first_turn = session_data['turns'][0]['id']
        last_turn = session_data['turns'][-1]['id']
        create_edge(first_turn, last_turn, 'trajectory',
                    weight=session_data['reward'],
                    metadata={'skill': session_data.get('skill'),
                              'domain': session_data.get('domain')})

---

5. Cost Analysis

ItemComputeTimeDollar Cost
Temporal edge constructionSupabase SQL~2 hours$0 (Supabase free tier)
Referential edge constructionSupabase SQL + embeddings~4 hours$0 (embeddings already exist)
Causal edges from KARLPython script~30 min$0
Categorical edgesSupabase SQL~1 hour$0
`memory_edges` table storage~50MBn/a$0 (within free tier)
Indexes~20MB~10 min$0
RAG++ container rebuildDocker build~5 min$0
MCP server updateCode edit~1 hour$0

Total: ~8 hours of compute time, $0 incremental cost.

---

6. Strengths

  • Zero new services. Everything lives in existing Postgres.
  • Incremental. Vector search works exactly as before. Graph is an optional overlay.
  • Standard SQL. No graph query language to learn. Recursive CTEs handle multi-hop.
  • The edge table is append-only. New edges never invalidate old ones.
  • Mesh pgvector nodes can each hold a copy of the edges table (same replication model as turns).

7. Weaknesses

  • Recursive CTEs in Postgres get slow past 4-5 hops on large graphs. 341K edges with 5-hop recursion could hit seconds.
  • Edge construction is batch, not real-time. New turns don't get edges until the next batch run.
  • Referential edges depend on metadata quality. If `metadata.files` is incomplete, edges are missing.
  • No native graph algorithms (PageRank, community detection, shortest path). Would need application-level implementation or pg_graphql extension.
  • IVFFlat + edge joins in the same query can confuse the Postgres query planner. May need to separate vector search and graph traversal into two round-trips.

---

8. What This Path Uniquely Enables

The Edge Table is the minimum viable graph. It answers the question: what is the smallest change that gives us graph-augmented retrieval? One table, five indexes, a batch construction script, and a rerank function. If this proves valuable, Paths B-F offer ways to make it faster, smarter, and more autonomous. But this path ships first.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

evo-cube-output/mega-cube-17-graph-vector-intel/stage1-path-a.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture · is Stage Research