Back to corpus
experimentexperiment writeup candidatescore 24

RAG++ Evaluation Framework

Runs all three evaluation components: - Action Classification (30-100 labeled events) - Recommendation Quality (5-15 states) - State-Awareness (regime consistency, flag sensitivity)

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

Complete evaluation suite for RAG++ v0 with action classification, recommendation quality, and state-awareness testing. This creates 3 evaluation users with 60-90 days of realistic trajectory data each. Runs all three evaluation components: - Action Classification (30-100 labeled events) - Recommendation Quality (5-15 states) - State-Awareness (regime consistency, flag sensitivity) **Measures**: Precision, Recall, F1 for each action type - ReduceGravity - ReduceMass - IncreaseAlignment - IncreaseThrust **Methods Tested**: - Heuristic (keyword patterns) - LLM (Anthropic API) - Hybrid (heuristic + LLM fallback)

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.