KARL: Trajectory Reward Engine

Paper workspace

Live draft structure

live-draft

Artifacts

System and paper source

KARL is a running system. A public PDF render should follow privacy review of trajectory examples.

source-only

Editable source

Running system plus paper source exists. Raw traces remain private; public page should expose method and proof summaries only.

Source anchors

karl/paper/karl-paper.md

karl-research-paper.md

karl reward engine and trajectory ledger deployment

Method tags

trajectory rewardexperience replayadvantage weightingagent training

Ingest intersections

karltrajectoryrewardagent-trainingledger

Status

Running; thousands of trajectories scored, retraining loop active.

Key claims

How an agent works can be more instructive than whether one final response looked correct.

Trajectory-level reward creates a continuous learning signal from real work.

Training data selection should be advantage-weighted, not random.

Public reading note

System summary public; raw session traces are private.

Standard skeleton

What this paper must keep proving

Schema

problem

Agent systems produce rich tool-use behavior but usually discard the process signal after the final answer.

method

Score complete trajectories with observable process/outcome signals and use high-advantage sessions for training data.

implementation

Trajectory taps, ledger daemon, reward engine, SFT exporter, and entity/skill performance bridge.

data

Private real agent sessions summarized into privacy-safe metrics and training examples after review.

evaluation

Reward ablations, downstream training lift, routing improvements, and benchmark head-to-heads.

references

Process reward models, agent trajectory learning, Databricks agent training, SWE-bench, RLHF.

openQuestions

Official real-repo downstream lift remains the hard public proof gate.

Checkpoints and references

Proof chain

implementationproven

Running scored ledger

KARL trajectory store and ledger daemon

The system is operational; raw traces stay private.

experimentpartial

Downstream training lift

reward-selected adapter tests

Early positive signals exist; broader official benchmark proof remains open.