Back to corpus
experimentexperiment writeup candidatescore 24
RAG++ Evaluation Framework
Runs all three evaluation components: - Action Classification (30-100 labeled events) - Recommendation Quality (5-15 states) - State-Awareness (regime consistency, flag sensitivity)
Full HTML reader
Read the full artifact
Extracted abstract or opening context
Complete evaluation suite for RAG++ v0 with action classification, recommendation quality, and state-awareness testing.
This creates 3 evaluation users with 60-90 days of realistic trajectory data each.
Runs all three evaluation components: - Action Classification (30-100 labeled events) - Recommendation Quality (5-15 states) - State-Awareness (regime consistency, flag sensitivity)
**Measures**: Precision, Recall, F1 for each action type - ReduceGravity - ReduceMass - IncreaseAlignment - IncreaseThrust
**Methods Tested**: - Heuristic (keyword patterns) - LLM (Anthropic API) - Hybrid (heuristic + LLM fallback)
Promotion decision
What has to happen next
Attach run IDs, datasets, metrics, and reproduction commands.
Why this is not always a full paper yet
Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.