Geometric Motifs for Selecting and Routing Coding-Agent Training Data

Full HTML reader

Read the full artifact

Extracted abstract or opening context

We present a method for compactly annotating coding agent sessions with behavioral motifs and geometric features, then conditioning training data generation on these annotations. From 834 real multi-project coding sessions spanning 4,633 turn-level records across 50+ applications, we extract 10-category symbolic labels (inscriptions) and 5 continuous geometric scalars. We show that: (1) transition pressure predicts session convergence at 71.8% accuracy (z = 2.72, p < 0.007), (2) advantage-weighted training using these annotations yields Cohen's d = 3.065 over random selection, and (3) geometry-conditioned routing produces higher-specificity training data for inscription-type sessions (Cohen's d = +1.02) but requires balanced quota enforcement — unconstrained routing concentrates sessions in low-specificity lenses, reducing overall quality (d = -0.60 when corpus is residual-dominated). Motivated by recent work on conditional memory in transformers [1], we test whether retrieval-conditioned supervision, where the model is trained on annotated behavioral patterns directly, further improves downstream performance. We describe the annotation pipeline, routing mechanism, quality verification, and iterative reward loop, and report results from a deployment spanning 50+ applications across 5 machines.

Promotion decision

What has to happen next

Convert into the standard paper schema, add citations, and render a draft PDF.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.