26. Research Execution Fabric
**Status**: Active architecture **Scope**: Shared agent architecture for research-driven execution, remote training, evaluation, meta-review, and paper synthesis **Audience**: Claude Code, Codex, Gemini, orchestration services, paper-writing pipelines
Full Public Reader
26. Research Execution Fabric
Status: Active architecture
Scope: Shared agent architecture for research-driven execution, remote training, evaluation, meta-review, and paper synthesis
Audience: Claude Code, Codex, Gemini, orchestration services, paper-writing pipelines
---
Purpose
This document defines a global architecture for how the AI stack turns a prompt like:
- "Read this paper and reproduce the experiment"
- "Train a model on this dataset"
- "Run the Vast.ai workflow and tell me the result"
- "Take the findings through evaluation, meta-review, and paper drafting"
into a deterministic multi-stage process.
This is not an ASR-specific system.
This is not a `cog-rlm` feature.
This is a shared execution fabric that sits above individual workloads and below human intent.
The ASR Paper 6 run is one profile inside this fabric, not the definition of the fabric.
---
Core Principle
The system should treat research execution as a first-class AI capability with two linked rails:
1. Research Rail
Takes an idea or source artifact through framing, hypothesis formation, dataset discovery, design, evaluation framing, and paper synthesis.
2. Execution Rail
Takes a concrete experiment plan through environment setup, remote execution, monitoring, recovery, verification, artifact collection, and result summarization.
The rails meet at a shared contract:
- hypothesis
- data contract
- execution profile
- evaluation contract
- publication contract
---
High-Level Flow
Prompt / Paper / Idea
|
v
1. Intent Intake
|
v
2. Divergent Rail
research angles / failure expectations / workload classes
|
v
3. Research Synthesis
sources / prior logs / prompt history / prior experiments
|
v
4. Hypothesis Contract
what is being tested and how success is measured
|
v
5. Data Contract
sources / schemas / transforms / risks / provenance
|
v
6. Execution Profile
local / mesh / Vast.ai / benchmark / finetune / inference-only
|
+------------------------+
| |
v v
7a. Remote Execution Rail 7b. Local / Mesh Execution Rail
bootstrap bootstrap
run run
monitor monitor
recover recover
verify verify
| |
+-----------+------------+
|
v
8. Evaluation
metrics / baselines / regressions / artifact checks
|
v
9. Meta-Review
bug hunt / invalid assumptions / missing controls / paper audit
|
v
10. Paper / Blog / Briefing Synthesis
|
v
11. Memory + Registry Update---
Execution Model
The Claude Code or Codex session is the default linear executor.
That matters because the session already has:
- shell access
- MCP tools
- prompt logs
- Orbit / context recovery
- mesh dispatch
- browser automation
- file system access
- paper-writing ability
The fabric assumes the active tool-rich session can do the whole chain end to end:
- read sources
- inspect prior failures
- compile a workload
- execute the workload
- validate outputs
- run meta-review
- write the paper draft
Sub-agents remain optional accelerators, not required architecture.
---
The Two Shared Subsystems
A. Research Workflow Layer
This layer is responsible for:
- reading prompt history and prior experiment logs
- recovering prior hypotheses and failed paths
- comparing possible experiment directions
- turning a source paper or idea into a testable contract
- defining what data is needed
- defining what result would support or falsify the hypothesis
- carrying the result into paper and blog synthesis
This is where Evoflow-style divergence belongs.
This is also where meta-review belongs.
B. Execution Workflow Layer
This layer is responsible for:
- choosing compute substrate
- compiling exact setup and run commands
- defining monitor and recovery behavior
- tracking expected artifacts and success markers
- incorporating failure patterns from prompt logs
- retrying only when verification fails
- preserving resumability across instance death, process death, and package drift
This is where Vast.ai belongs.
---
Why Vast.ai Is Only One Profile
Vast.ai should be modeled as an execution profile under the fabric, not the whole workflow.
Examples of execution profiles:
- `vastai.generic`
- `vastai.training`
- `vastai.paper_bundle`
- `mesh.parallel`
- `local.prototype`
- `local.benchmark`
- `remote.inference`
The `vastai.paper_bundle` profile is what the N'Ko Paper 6 run used:
- remote bootstrap
- dependency pinning
- extraction
- training bundle
- monitor + relaunch
- artifact verification
- results download
But the global system must also support:
- reading a paper and generating a reproduction plan
- collecting or transforming data first
- evaluating against a baseline
- drafting the hypothesis and results section
- running meta-review before claiming anything
---
Shared Contracts
Every workload should compile into the following contracts.
1. Hypothesis Contract
- experiment question
- claim under test
- expected directional outcome
- falsifiers
- baseline
- metric set
2. Data Contract
- source datasets
- schemas and field assumptions
- transforms
- noise warnings
- provenance
- volume requirements
3. Execution Contract
- substrate: local / mesh / Vast.ai
- bootstrap commands
- run commands
- monitor interval
- recovery rules
- retry rules
- artifact list
- success markers
4. Evaluation Contract
- metrics
- held-out split policy
- baselines
- sanity checks
- artifact verification
- regression checks
5. Publication Contract
- result summary
- caveats
- negative findings
- paper section updates
- blog / briefing outputs
---
Incident-Aware Operation
The execution rail must be informed by prior incident logs.
Examples already recovered from the Vast.ai sessions:
- never destroy active instances before SSH verification and artifact merge
- do not rewrite scripts that are already producing output without a correctness reason
- pin drifting dependencies
- validate schema assumptions before long runs
- assert feature flags are actually wired into runtime behavior
- treat process death separately from instance death
- verify artifacts before counting a run as complete
- keep monitors portable across macOS and Linux
These incidents are not ASR-specific.
They are execution intelligence.
They belong in the shared fabric.
---
Relationship to Existing Systems
Evoflow
Evoflow belongs in the Research Workflow Layer.
It is a divergence and synthesis engine for shaping experiments before execution.
Meta-Review
Meta-review belongs after evaluation and before publication.
Its job is to attack assumptions, controls, methodology, missing tests, and overclaims.
Orbit / Context Recovery
Orbit belongs in the source-recovery stage.
It provides prior experiments, prompt logs, plans, and session context.
Vast.ai Workflow
Vast.ai belongs in the execution profile layer as one deterministic remote substrate.
Paper Pipeline
The paper pipeline belongs in the publication contract.
It should consume the verified experiment outputs, not raw optimistic notes.
---
Architectural Decision
The shared home for this architecture should be Comp-Core, not `cog-rlm`.
Reason:
- `Comp-Core` is the system-level repository for shared agent and orchestration architecture.
- `cog-rlm` can consume this architecture, but should not define it.
- Claude Code, Codex, Gemini, and future orchestrators need a neutral home that is not tied to one product.
`cog-rlm` is therefore a consumer implementation.
The global architecture lives here in `Comp-Core`.
---
Initial Shared Deliverables
The first shared implementation should include:
1. A workflow manifest describing the stages and contracts.
2. An incident registry extracted from prompt logs and failure docs.
3. A compiler that turns a prompt-level objective into a deterministic execution plan.
4. A profile system for substrates like Vast.ai.
5. A publication pipeline hook so execution can flow into hypothesis writeup, meta-review, and paper drafting.
---
Non-Goals
This architecture does not require:
- full autonomous operation without a tool-rich session
- one monolithic service for all execution substrates
- ASR-only abstractions
- paper writing without experiment verification
---
Immediate Consequence
Any implementation hidden inside one app repo should be treated as provisional until its logic is promoted into shared architecture and shared tooling.
That is the change made here:
- `Comp-Core` owns the architecture
- shared tooling owns the workflow manifest
- app repos consume the compiled plans
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
Comp-Core/docs/architecture/26-RESEARCH_EXECUTION_FABRIC.md
Detected Structure
Method · Evaluation · Architecture