Back to corpus
research noteexperiment writeup candidatescore 24

KARL - Trajectory Memory Ledger

KARL is the reference implementation of the Trajectory Memory Ledger: a schema-normalized experience replay layer for AI coding agents. It records what an agent does during real work sessions, normalizes those recordings into a schema-v2 trajectory store, scores them with a six-signal reward engine, and uses the highest-scoring trajectories to improve future performance through LoRA fine-tuning and learned skill routing.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

KARL is the reference implementation of the Trajectory Memory Ledger: a schema-normalized experience replay layer for AI coding agents. It records what an agent does during real work sessions, normalizes those recordings into a schema-v2 trajectory store, scores them with a six-signal reward engine, and uses the highest-scoring trajectories to improve future performance through LoRA fine-tuning and learned skill routing. Based on the Trajectory Memory Ledger paper ([arXiv 2603.05218](https://arxiv.org/abs/2603.05218)). The trajectory store grows over time. The current normalized store contains 7,468 scored trajectories, 67,409 observed tool events, and 73,470 recovered tool steps. The best trajectories are exported as advantage-weighted SFT data and used to train a LoRA adapter via MLX; the current export contains 3,678 ChatML examples split into 3,310 train and 368 validation rows. | Signal | Weight | Measures | |--------|--------|----------| | **Outcome** | 25% | Was the user satisfied? No correction, no redo, build passed, session continued | | **Process** | 22% | Did tools work? Success rate, bash exit codes, error density | | **Efficiency** | 13% | Was it efficient? Tool diversity, duration efficiency, file touch rate | | **Verification** | 13% | Did the agent check its own work? Tests, builds, read-after-write | | **Consistency** | 13% | Was the tool order coherent? Low thrash, enough context before mutation | | **Wasted motion** | 14% | Did the trajectory avoid repeated loops and unnecessary retries? | Composite reward is `[0, 1]`. Advantage = reward - domain baseline, used for OAPL-Lite oversampling. On the normalized corpus, mean reward is 0.6632 and the reward range is 0.4666-0.8165.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.