RAG++ Specification
RAG++ is a high-performance retrieval engine that provides **statistical priors** from outcome-annotated trajectories. Unlike traditional RAG systems that retrieve text for language model context, RAG++ retrieves structured execution traces and computes distributional statistics for downstream policy conditioning. **Key Insight**: Past execution outcomes encode implicit knowledge about action feasibility, timing, and context-dependent success rates. RAG++ surfaces this knowledge as queryable priors.
Full Public Reader
RAG++ Specification
Retrieval-Augmented Generation for Policy Memory
Version: 1.0.0
---
Abstract
RAG++ is a high-performance retrieval engine that provides statistical priors from outcome-annotated trajectories. Unlike traditional RAG systems that retrieve text for language model context, RAG++ retrieves structured execution traces and computes distributional statistics for downstream policy conditioning.
Key Insight: Past execution outcomes encode implicit knowledge about action feasibility, timing, and context-dependent success rates. RAG++ surfaces this knowledge as queryable priors.
---
1. Problem Statement
Given:
- A query context (embedding + metadata)
- A corpus of outcome-annotated trajectories
Produce:
- Statistical priors over relevant trajectory outcomes
- Ranked candidate trajectories for reference
- Confidence bounds on prior estimates
Non-Goals:
- RAG++ does NOT train models
- RAG++ does NOT generate actions
- RAG++ does NOT modify trajectories
---
2. Input Contract
2.1 Query Bundle
QueryBundle {
embedding: Vector<f32>[D] # Query embedding, dimension D
k: u32 # Number of candidates to retrieve
metadata_filter: FilterExpr? # Optional metadata constraints
index_names: Vec<String>? # Optional: specific indexes to search
}Invariants:
- `embedding.len() == D` (dimension must match index)
- `k > 0`
- `embedding` values must be finite (no NaN/Inf)
2.2 Memory Record (Corpus Entry)
MemoryRecord {
id: String # Unique identifier
embedding: Vector<f32>[D] # Trajectory embedding
context: String # Human-readable context description
outcome: f64 # Primary outcome metric (e.g., success rate)
metadata: Map<String, Value> # Arbitrary key-value metadata
created_at: u64 # Unix timestamp (ms)
stats: OutcomeStats # Running statistics for this record
}Invariants:
- `id` is unique within the corpus
- `embedding.len() == D` (consistent dimension)
- `outcome` is finite
- Records are immutable after creation (only `stats` may be updated)
2.3 Metadata Value Types
MetadataValue =
| String(String)
| Int(i64)
| Float(f64)
| Bool(bool)
| StringList(Vec<String>)---
3. Output Contract
3.1 Prior Bundle
PriorBundle {
mean: f64 # Mean outcome of retrieved trajectories
variance: f64 # Variance of outcomes
confidence: f64 # Confidence in estimate (0-1)
count: u64 # Number of samples contributing
min: f64 # Minimum observed outcome
max: f64 # Maximum observed outcome
# Optional extended statistics
percentiles: Map<u8, f64>? # e.g., {50: 0.75, 95: 0.92}
outcome_distribution: Histogram? # Bucketed distribution
}Guarantees:
- `variance >= 0`
- `min <= mean <= max`
- `confidence` increases with `count`
- If `count == 0`, all statistics are undefined (Option::None)
3.2 Query Response
QueryResponse {
prior: PriorBundle # Computed priors from retrieved records
candidates: Vec<RankedCandidate> # Top-k candidates with scores
latency: Duration # Query execution time
indexes_searched: Vec<String> # Which indexes were queried
}
RankedCandidate {
record_id: String # Reference to MemoryRecord
score: f64 # Retrieval score (higher = more relevant)
distance: f64 # Raw distance from query
rank: u32 # Position in result list (1-indexed)
}---
4. Core Operations
4.1 Index Operations
| Operation | Input | Output | Complexity |
|---|---|---|---|
| `add(record)` | MemoryRecord | RecordId | O(log n) amortized |
| `add_batch(records)` | Vec<MemoryRecord> | Vec<RecordId> | O(m log n) |
| `remove(id)` | RecordId | bool | O(log n) |
| `search(query, k)` | (Vector, k) | Vec<SearchResult> | O(n) flat, O(log n) HNSW |
4.2 Query Operations
| Operation | Input | Output | Description |
|---|---|---|---|
| `query(bundle)` | QueryBundle | QueryResponse | Full retrieval + prior computation |
| `search_only(embedding, k)` | (Vector, k) | Vec<SearchResult> | Raw vector search |
| `compute_priors(records)` | Vec<MemoryRecord> | PriorBundle | Statistics from records |
4.3 Statistics Operations
| Operation | Input | Output | Description |
|---|---|---|---|
| `update_stats(id, outcome)` | (RecordId, f64) | () | Add outcome observation |
| `get_stats(id)` | RecordId | OutcomeStats | Retrieve running stats |
| `merge_stats(a, b)` | (OutcomeStats, OutcomeStats) | OutcomeStats | Combine statistics |
---
5. Algorithms
5.1 Welford's Online Algorithm (Statistics)
For numerically stable mean and variance computation:
on update(x):
count += 1
delta = x - mean
mean += delta / count
delta2 = x - mean
M2 += delta * delta2
variance = M2 / count (population)
variance = M2 / (count - 1) (sample)Guarantees:
- Numerically stable for large count
- Single-pass (O(1) per update)
- Parallelizable via merge operation
5.2 Score Fusion (Multi-Index)
When searching multiple indexes:
Reciprocal Rank Fusion (RRF):
RRF_score(d) = Σ 1 / (k + rank_i(d))where k=60 (standard constant), rank_i(d) is document d's rank in index i.
CombSUM: `score(d) = Σ normalized_score_i(d)`
CombMNZ: `score(d) = |indexes containing d| × Σ normalized_score_i(d)`
5.3 Reranking Strategies
After initial retrieval, candidates may be reranked:
1. Outcome-Based: Boost by historical outcome `score × (1 + α × outcome)`
2. Recency-Based: Decay by age `score × exp(-λ × age_days)`
3. MMR Diversity: Penalize similarity to already-selected items
4. Composite: Weighted combination of above
---
6. Configuration
6.1 Index Configuration
IndexConfig {
dimension: u32 # Embedding dimension (required)
distance_type: DistanceType # L2 | Cosine | InnerProduct
index_type: IndexType # Flat | HNSW
# HNSW-specific
hnsw_m: u32 = 16 # Max connections per node
hnsw_ef_construction: u32 = 200 # Construction search width
hnsw_ef_search: u32 = 50 # Query search width
}6.2 Query Engine Configuration
QueryEngineConfig {
default_k: u32 = 10 # Default retrieval count
fusion_strategy: FusionStrategy = RRF
reranker: RerankerConfig? # Optional reranking
cache_config: CacheConfig? # Optional query caching
timeout_ms: u64 = 1000 # Query timeout
}6.3 Cache Configuration
CacheConfig {
max_entries: usize = 10000 # Maximum cached queries
ttl_seconds: u64 = 300 # Time-to-live
enabled: bool = true
}---
7. Invariants
| ID | Invariant | Enforcement |
|---|---|---|
| INV-001 | Records are immutable after creation (except stats) | Type system |
| INV-002 | Welford algorithm maintains numerical stability | Algorithm choice |
| INV-003 | WAL entries written before buffer acknowledgment | Write ordering |
| INV-004 | Index and store remain consistent | Transactional updates |
| INV-005 | All embeddings have consistent dimension per index | Runtime check |
| INV-006 | Record IDs are unique within a store | Insert validation |
---
8. Error Taxonomy
Error =
| DimensionMismatch { expected: u32, got: u32 }
| RecordNotFound { id: String }
| DuplicateRecord { id: String }
| IndexNotFound { name: String }
| InvalidFilter { reason: String }
| QueryTimeout { elapsed_ms: u64 }
| WalCorrupted { sequence: u64 }
| ChecksumMismatch---
9. Performance Characteristics
9.1 Time Complexity
| Operation | Flat Index | HNSW Index |
|---|---|---|
| Add single | O(1) | O(log n) |
| Search k | O(n × D) | O(log n × D) |
| Remove | O(1) | O(M × log n) |
9.2 Space Complexity
| Component | Memory |
|---|---|
| Record storage | O(n × (D + metadata_size)) |
| Flat index | O(n × D) |
| HNSW index | O(n × M × levels) |
| Query cache | O(cache_size × k) |
9.3 Benchmarks (Reference)
On 100K records, D=512, k=10:
| Metric | Flat | HNSW |
|---|---|---|
| Query latency (p50) | 12ms | 0.8ms |
| Query latency (p99) | 18ms | 2.1ms |
| Recall@10 | 1.00 | 0.95+ |
| Memory | 200MB | 280MB |
---
10. Integration Points
10.1 Embedding Provider (External)
RAG++ expects embeddings to be provided externally. Any embedding model works:
# Example: Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("trajectory context description")10.2 Downstream Consumer
RAG++ outputs are consumed by policy/generation systems:
# Example: Using priors for action conditioning
response = rag.query(query_bundle)
prior = response.prior
# Condition policy on retrieved priors
if prior.confidence > 0.8:
action_distribution = policy.forward(
state,
prior_mean=prior.mean,
prior_variance=prior.variance
)10.3 Persistence Layer
WAL provides durability; external systems handle:
- Periodic snapshots
- Backup/restore
- Distributed replication
---
11. API Summary
Rust Core
// Create index
let mut index = FlatIndex::new(512, DistanceType::Cosine);
// Add records
index.add("rec-1", &embedding)?;
// Search
let results = index.search(&query_embedding, 10)?;
// Query engine (full pipeline)
let engine = QueryEngine::new(config);
let response = engine.query(QueryRequest {
embedding: query_vec,
k: 10,
filter: None,
index_names: None,
})?;
// Access priors
println!("Prior mean: {}", response.prior.mean);Python Bindings
from rag_plusplus import FlatIndex, IndexRegistry, DistanceType
# Create index
index = FlatIndex(dimension=512, distance_type=DistanceType.Cosine)
# Add vectors
index.add("rec-1", embedding)
# Search
results = index.search(query_embedding, k=10)
# Multi-index
registry = IndexRegistry()
registry.register("semantic", semantic_index)
registry.register("temporal", temporal_index)
results = registry.search_all(query, k=10)---
12. Future Extensions
1. Distributed Sharding: Partition by embedding hash for horizontal scale
2. Incremental Index Updates: Avoid full rebuilds on add
3. Learned Index Structures: Replace HNSW with neural retriever
4. Streaming Updates: Real-time trajectory ingestion
5. Explanation Generation: Why these trajectories were retrieved
---
References
1. Welford, B.P. (1962). "Note on a method for calculating corrected sums of squares and products"
2. Malkov, Y., Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using HNSW graphs"
3. Cormack, G., et al. (2009). "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods"
---
RAG++ is part of the Comp-Core motion generation system.
Promotion Decision
Convert into the standard paper schema, add citations, and render a draft PDF.
Source Anchor
Comp-Core/core/retrieval/cc-rag-plus-plus/SPECIFICATION.md
Detected Structure
Abstract · Method · Evaluation · References · Architecture