Grand Diomande Research · Full HTML Reader

RAG++ Specification

RAG++ is a high-performance retrieval engine that provides **statistical priors** from outcome-annotated trajectories. Unlike traditional RAG systems that retrieve text for language model context, RAG++ retrieves structured execution traces and computes distributional statistics for downstream policy conditioning. **Key Insight**: Past execution outcomes encode implicit knowledge about action feasibility, timing, and context-dependent success rates. RAG++ surfaces this knowledge as queryable priors.

Agents That Account for Themselves working paper preprint structure candidate score 80 .md

Full Public Reader

RAG++ Specification

Retrieval-Augmented Generation for Policy Memory

Version: 1.0.0

---

Abstract

RAG++ is a high-performance retrieval engine that provides statistical priors from outcome-annotated trajectories. Unlike traditional RAG systems that retrieve text for language model context, RAG++ retrieves structured execution traces and computes distributional statistics for downstream policy conditioning.

Key Insight: Past execution outcomes encode implicit knowledge about action feasibility, timing, and context-dependent success rates. RAG++ surfaces this knowledge as queryable priors.

---

1. Problem Statement

Given:
- A query context (embedding + metadata)
- A corpus of outcome-annotated trajectories

Produce:
- Statistical priors over relevant trajectory outcomes
- Ranked candidate trajectories for reference
- Confidence bounds on prior estimates

Non-Goals:
- RAG++ does NOT train models
- RAG++ does NOT generate actions
- RAG++ does NOT modify trajectories

---

2. Input Contract

2.1 Query Bundle

QueryBundle {
    embedding: Vector<f32>[D]      # Query embedding, dimension D
    k: u32                          # Number of candidates to retrieve
    metadata_filter: FilterExpr?    # Optional metadata constraints
    index_names: Vec<String>?       # Optional: specific indexes to search
}

Invariants:
- `embedding.len() == D` (dimension must match index)
- `k > 0`
- `embedding` values must be finite (no NaN/Inf)

2.2 Memory Record (Corpus Entry)

MemoryRecord {
    id: String                      # Unique identifier
    embedding: Vector<f32>[D]       # Trajectory embedding
    context: String                 # Human-readable context description
    outcome: f64                    # Primary outcome metric (e.g., success rate)
    metadata: Map<String, Value>    # Arbitrary key-value metadata
    created_at: u64                 # Unix timestamp (ms)
    stats: OutcomeStats             # Running statistics for this record
}

Invariants:
- `id` is unique within the corpus
- `embedding.len() == D` (consistent dimension)
- `outcome` is finite
- Records are immutable after creation (only `stats` may be updated)

2.3 Metadata Value Types

MetadataValue =
    | String(String)
    | Int(i64)
    | Float(f64)
    | Bool(bool)
    | StringList(Vec<String>)

---

3. Output Contract

3.1 Prior Bundle

PriorBundle {
    mean: f64                       # Mean outcome of retrieved trajectories
    variance: f64                   # Variance of outcomes
    confidence: f64                 # Confidence in estimate (0-1)
    count: u64                      # Number of samples contributing
    min: f64                        # Minimum observed outcome
    max: f64                        # Maximum observed outcome

    # Optional extended statistics
    percentiles: Map<u8, f64>?      # e.g., {50: 0.75, 95: 0.92}
    outcome_distribution: Histogram? # Bucketed distribution
}

Guarantees:
- `variance >= 0`
- `min <= mean <= max`
- `confidence` increases with `count`
- If `count == 0`, all statistics are undefined (Option::None)

3.2 Query Response

QueryResponse {
    prior: PriorBundle              # Computed priors from retrieved records
    candidates: Vec<RankedCandidate> # Top-k candidates with scores
    latency: Duration               # Query execution time
    indexes_searched: Vec<String>   # Which indexes were queried
}

RankedCandidate {
    record_id: String               # Reference to MemoryRecord
    score: f64                      # Retrieval score (higher = more relevant)
    distance: f64                   # Raw distance from query
    rank: u32                       # Position in result list (1-indexed)
}

---

4. Core Operations

4.1 Index Operations

Operation	Input	Output	Complexity
`add(record)`	MemoryRecord	RecordId	O(log n) amortized
`add_batch(records)`	Vec<MemoryRecord>	Vec<RecordId>	O(m log n)
`remove(id)`	RecordId	bool	O(log n)
`search(query, k)`	(Vector, k)	Vec<SearchResult>	O(n) flat, O(log n) HNSW

4.2 Query Operations

Operation	Input	Output	Description
`query(bundle)`	QueryBundle	QueryResponse	Full retrieval + prior computation
`search_only(embedding, k)`	(Vector, k)	Vec<SearchResult>	Raw vector search
`compute_priors(records)`	Vec<MemoryRecord>	PriorBundle	Statistics from records

4.3 Statistics Operations

Operation	Input	Output	Description
`update_stats(id, outcome)`	(RecordId, f64)	()	Add outcome observation
`get_stats(id)`	RecordId	OutcomeStats	Retrieve running stats
`merge_stats(a, b)`	(OutcomeStats, OutcomeStats)	OutcomeStats	Combine statistics

---

5. Algorithms

5.1 Welford's Online Algorithm (Statistics)

For numerically stable mean and variance computation:

on update(x):
    count += 1
    delta = x - mean
    mean += delta / count
    delta2 = x - mean
    M2 += delta * delta2

variance = M2 / count  (population)
variance = M2 / (count - 1)  (sample)

Guarantees:
- Numerically stable for large count
- Single-pass (O(1) per update)
- Parallelizable via merge operation

5.2 Score Fusion (Multi-Index)

When searching multiple indexes:

Reciprocal Rank Fusion (RRF):

RRF_score(d) = Σ 1 / (k + rank_i(d))

where k=60 (standard constant), rank_i(d) is document d's rank in index i.

CombSUM: `score(d) = Σ normalized_score_i(d)`

CombMNZ: `score(d) = |indexes containing d| × Σ normalized_score_i(d)`

5.3 Reranking Strategies

After initial retrieval, candidates may be reranked:

1. Outcome-Based: Boost by historical outcome `score × (1 + α × outcome)`
2. Recency-Based: Decay by age `score × exp(-λ × age_days)`
3. MMR Diversity: Penalize similarity to already-selected items
4. Composite: Weighted combination of above

---

6. Configuration

6.1 Index Configuration

IndexConfig {
    dimension: u32                  # Embedding dimension (required)
    distance_type: DistanceType     # L2 | Cosine | InnerProduct
    index_type: IndexType           # Flat | HNSW

    # HNSW-specific
    hnsw_m: u32 = 16                # Max connections per node
    hnsw_ef_construction: u32 = 200 # Construction search width
    hnsw_ef_search: u32 = 50        # Query search width
}

6.2 Query Engine Configuration

QueryEngineConfig {
    default_k: u32 = 10             # Default retrieval count
    fusion_strategy: FusionStrategy = RRF
    reranker: RerankerConfig?       # Optional reranking
    cache_config: CacheConfig?      # Optional query caching
    timeout_ms: u64 = 1000          # Query timeout
}

6.3 Cache Configuration

CacheConfig {
    max_entries: usize = 10000      # Maximum cached queries
    ttl_seconds: u64 = 300          # Time-to-live
    enabled: bool = true
}

---

7. Invariants

ID	Invariant	Enforcement
INV-001	Records are immutable after creation (except stats)	Type system
INV-002	Welford algorithm maintains numerical stability	Algorithm choice
INV-003	WAL entries written before buffer acknowledgment	Write ordering
INV-004	Index and store remain consistent	Transactional updates
INV-005	All embeddings have consistent dimension per index	Runtime check
INV-006	Record IDs are unique within a store	Insert validation

---

8. Error Taxonomy

Error =
    | DimensionMismatch { expected: u32, got: u32 }
    | RecordNotFound { id: String }
    | DuplicateRecord { id: String }
    | IndexNotFound { name: String }
    | InvalidFilter { reason: String }
    | QueryTimeout { elapsed_ms: u64 }
    | WalCorrupted { sequence: u64 }
    | ChecksumMismatch

---

9. Performance Characteristics

9.1 Time Complexity

Operation	Flat Index	HNSW Index
Add single	O(1)	O(log n)
Search k	O(n × D)	O(log n × D)
Remove	O(1)	O(M × log n)

9.2 Space Complexity

Component	Memory
Record storage	O(n × (D + metadata_size))
Flat index	O(n × D)
HNSW index	O(n × M × levels)
Query cache	O(cache_size × k)

9.3 Benchmarks (Reference)

On 100K records, D=512, k=10:

Metric	Flat	HNSW
Query latency (p50)	12ms	0.8ms
Query latency (p99)	18ms	2.1ms
Recall@10	1.00	0.95+
Memory	200MB	280MB

---

10. Integration Points

10.1 Embedding Provider (External)

RAG++ expects embeddings to be provided externally. Any embedding model works:

python

# Example: Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("trajectory context description")

10.2 Downstream Consumer

RAG++ outputs are consumed by policy/generation systems:

python

# Example: Using priors for action conditioning
response = rag.query(query_bundle)
prior = response.prior

# Condition policy on retrieved priors
if prior.confidence > 0.8:
    action_distribution = policy.forward(
        state,
        prior_mean=prior.mean,
        prior_variance=prior.variance
    )

10.3 Persistence Layer

WAL provides durability; external systems handle:
- Periodic snapshots
- Backup/restore
- Distributed replication

---

11. API Summary

Rust Core

rust

// Create index
let mut index = FlatIndex::new(512, DistanceType::Cosine);

// Add records
index.add("rec-1", &embedding)?;

// Search
let results = index.search(&query_embedding, 10)?;

// Query engine (full pipeline)
let engine = QueryEngine::new(config);
let response = engine.query(QueryRequest {
    embedding: query_vec,
    k: 10,
    filter: None,
    index_names: None,
})?;

// Access priors
println!("Prior mean: {}", response.prior.mean);

Python Bindings

python

from rag_plusplus import FlatIndex, IndexRegistry, DistanceType

# Create index
index = FlatIndex(dimension=512, distance_type=DistanceType.Cosine)

# Add vectors
index.add("rec-1", embedding)

# Search
results = index.search(query_embedding, k=10)

# Multi-index
registry = IndexRegistry()
registry.register("semantic", semantic_index)
registry.register("temporal", temporal_index)
results = registry.search_all(query, k=10)

---

12. Future Extensions

1. Distributed Sharding: Partition by embedding hash for horizontal scale
2. Incremental Index Updates: Avoid full rebuilds on add
3. Learned Index Structures: Replace HNSW with neural retriever
4. Streaming Updates: Real-time trajectory ingestion
5. Explanation Generation: Why these trajectories were retrieved

---

References

1. Welford, B.P. (1962). "Note on a method for calculating corrected sums of squares and products"
2. Malkov, Y., Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using HNSW graphs"
3. Cormack, G., et al. (2009). "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods"

---

RAG++ is part of the Comp-Core motion generation system.

Promotion Decision

Convert into the standard paper schema, add citations, and render a draft PDF.

Source Anchor

Comp-Core/core/retrieval/cc-rag-plus-plus/SPECIFICATION.md

Detected Structure

Abstract · Method · Evaluation · References · Architecture