Back to corpus
research noteexperiment writeup candidatescore 40

Teaching My AI Agent to Learn From Its Own Mistakes

In early March, Databricks published KARL (Knowledge Agents via Reinforcement Learning), a system that trains enterprise search agents via reinforcement learning. They had 26 researchers, enterprise GPUs, and a proprietary base model. Their agent beats Claude Opus 4.6 and GPT 5.2 on enterprise search benchmarks at 33% lower cost.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

## How I Built a Self-Improving Code Agent on a Mac Mini, Inspired by Databricks' KARL In early March, Databricks published KARL (Knowledge Agents via Reinforcement Learning), a system that trains enterprise search agents via reinforcement learning. They had 26 researchers, enterprise GPUs, and a proprietary base model. Their agent beats Claude Opus 4.6 and GPT 5.2 on enterprise search benchmarks at 33% lower cost. I read the paper and thought: what if I applied the same idea, not to enterprise search, but to the software engineering agent I use every day? And not on a GPU cluster, but on the Mac Minis in my living room? Five days later, I had 485 recorded trajectories, a 5-signal reward engine, a trained LoRA adapter, and a system that automatically learns from every coding session I run. Here's how it works, how it differs from the original, and what I learned. KARL's premise is simple: instead of hand-writing rules for how an AI agent should behave, you record what the agent actually does, score those recordings based on outcomes, and train on the best ones.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.