Back to corpus
research noteexperiment writeup candidatescore 24

Harness Skills Layer

The harness skills layer turns executable benchmark deltas into evidence-bound skill packages. It is the local implementation of the useful parts of SkillDAG, SkillOpt, and MUSE-style memory packaging without making an unsafe claim that a failed adapter should be routed automatically.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

The harness skills layer turns executable benchmark deltas into evidence-bound skill packages. It is the local implementation of the useful parts of SkillDAG, SkillOpt, and MUSE-style memory packaging without making an unsafe claim that a failed adapter should be routed automatically. - public task prompts - canonical task specs - one baseline `executable-task-bench` report - one comparison `executable-task-bench` report - `trajectory-skills.jsonl`: one structured row per extracted skill family - `skill-graph.json`: typed graph nodes and edges - `router-index.json`: regression-gated routing index - `skillgraph-evolution-report.json`: aggregate comparison report - `packages/<skill_id>/SKILL.md`: human-readable activation boundary - `packages/<skill_id>/MEMORY.md`: compact task evidence memory - `packages/<skill_id>/tests.jsonl`: task-level evidence rows - `packages/<skill_id>/failure_modes.json`: quarantine and diagnostic metadata - `packages/<skill_id>/skill.json`: full structured package | Edge | Meaning | |---|---| | `depends_on` | Skill evidence belongs to a specific task set | | `specializes` | Skill applies to a task family such as `path`, `date`, or `parse` | | `repairs` | Comparison passed a task that the baseline failed | | `conflicts_with` | Comparison failed a task that the baseline passed | This is deliberately a harness-side graph, not a model-side promise. The graph records where a trajectory delta helped, where it hurt, and which families need repair before routing.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.