Back to corpus
working paperpreprint structure candidatescore 100

KARL-Edge: Multi-Signal Reinforcement Learning for Software Engineering Agents on Commodity Hardware

We present KARL-Edge, an adaptation of the Knowledge Agents via Reinforcement Learning (KARL) framework to multi-tool software engineering agents running on commodity Apple Silicon hardware. Where the original KARL system (Chang et al., 2026) trains enterprise search agents using full off-policy RL with binary reward signals, our system introduces three architectural contributions: (1) a 5-signal composite reward function that decomposes trajectory quality into outcome, process, efficiency, verification, and consis

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

We present KARL-Edge, an adaptation of the Knowledge Agents via Reinforcement Learning (KARL) framework to multi-tool software engineering agents running on commodity Apple Silicon hardware. Where the original KARL system (Chang et al., 2026) trains enterprise search agents using full off-policy RL with binary reward signals, our system introduces three architectural contributions: (1) a 5-signal composite reward function that decomposes trajectory quality into outcome, process, efficiency, verification, and consistency dimensions; (2) a hook-wired zero-overhead trajectory capture system that records production sessions without separate data collection infrastructure; and (3) a retroactive cross-turn correction signal that uses the user's natural behavior as an implicit reward label. We report preliminary results on 485 trajectories across 10 software engineering domains, with a mean composite reward of 0.583 and 84.3% positive advantage rate. We train a LoRA adapter on gemma-3-1b-it-4bit using advantage-weighted SFT (OAPL-Lite) and discuss limitations including corrupted reward signals from a schema migration bug and the absence of controlled A/B evaluation. We argue that the architectural contributions, particularly the multi-signal reward decomposition and hook-wired capture, are independently valuable and transferable to other agent training systems.

Promotion decision

What has to happen next

Convert into the standard paper schema, add citations, and render a draft PDF.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.