KARL-Edge: Multi-Signal Reinforcement Learning for Software Engineering Agents on Commodity Hardware

Full HTML reader

Read the full artifact

Extracted abstract or opening context

We present KARL-Edge, an adaptation of the Knowledge Agents via Reinforcement Learning (KARL) framework to multi-tool software engineering agents running on commodity Apple Silicon hardware. Where the original KARL system (Chang et al., 2026) trains enterprise search agents using full off-policy RL with binary reward signals, our system introduces three architectural contributions: (1) a 5-signal composite reward function that decomposes trajectory quality into outcome, process, efficiency, verification, and consistency dimensions; (2) a hook-wired zero-overhead trajectory capture system that records production sessions without separate data collection infrastructure; and (3) a retroactive cross-turn correction signal that uses the user's natural behavior as an implicit reward label. We report preliminary results on 485 trajectories across 10 software engineering domains, with a mean composite reward of 0.583 and 84.3% positive advantage rate. We train a LoRA adapter on gemma-3-1b-it-4bit using advantage-weighted SFT (OAPL-Lite) and discuss limitations including corrupted reward signals from a schema migration bug and the absence of controlled A/B evaluation. We argue that the architectural contributions, particularly the multi-signal reward decomposition and hook-wired capture, are independently valuable and transferable to other agent training systems.

Promotion decision

What has to happen next

Convert into the standard paper schema, add citations, and render a draft PDF.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.