Grand Diomande Research · Full HTML Reader

26. Research Execution Fabric

**Status**: Active architecture **Scope**: Shared agent architecture for research-driven execution, remote training, evaluation, meta-review, and paper synthesis **Audience**: Claude Code, Codex, Gemini, orchestration services, paper-writing pipelines

Language as Infrastructure architecture technical paper candidate score 40 .md

Full Public Reader

26. Research Execution Fabric

Status: Active architecture
Scope: Shared agent architecture for research-driven execution, remote training, evaluation, meta-review, and paper synthesis
Audience: Claude Code, Codex, Gemini, orchestration services, paper-writing pipelines

---

Purpose

This document defines a global architecture for how the AI stack turns a prompt like:

  • "Read this paper and reproduce the experiment"
  • "Train a model on this dataset"
  • "Run the Vast.ai workflow and tell me the result"
  • "Take the findings through evaluation, meta-review, and paper drafting"

into a deterministic multi-stage process.

This is not an ASR-specific system.
This is not a `cog-rlm` feature.
This is a shared execution fabric that sits above individual workloads and below human intent.

The ASR Paper 6 run is one profile inside this fabric, not the definition of the fabric.

---

Core Principle

The system should treat research execution as a first-class AI capability with two linked rails:

1. Research Rail
Takes an idea or source artifact through framing, hypothesis formation, dataset discovery, design, evaluation framing, and paper synthesis.

2. Execution Rail
Takes a concrete experiment plan through environment setup, remote execution, monitoring, recovery, verification, artifact collection, and result summarization.

The rails meet at a shared contract:

  • hypothesis
  • data contract
  • execution profile
  • evaluation contract
  • publication contract

---

High-Level Flow

text
Prompt / Paper / Idea
        |
        v
1. Intent Intake
        |
        v
2. Divergent Rail
   research angles / failure expectations / workload classes
        |
        v
3. Research Synthesis
   sources / prior logs / prompt history / prior experiments
        |
        v
4. Hypothesis Contract
   what is being tested and how success is measured
        |
        v
5. Data Contract
   sources / schemas / transforms / risks / provenance
        |
        v
6. Execution Profile
   local / mesh / Vast.ai / benchmark / finetune / inference-only
        |
        +------------------------+
        |                        |
        v                        v
7a. Remote Execution Rail    7b. Local / Mesh Execution Rail
    bootstrap                    bootstrap
    run                          run
    monitor                      monitor
    recover                      recover
    verify                       verify
        |                        |
        +-----------+------------+
                    |
                    v
8. Evaluation
   metrics / baselines / regressions / artifact checks
                    |
                    v
9. Meta-Review
   bug hunt / invalid assumptions / missing controls / paper audit
                    |
                    v
10. Paper / Blog / Briefing Synthesis
                    |
                    v
11. Memory + Registry Update

---

Execution Model

The Claude Code or Codex session is the default linear executor.

That matters because the session already has:

  • shell access
  • MCP tools
  • prompt logs
  • Orbit / context recovery
  • mesh dispatch
  • browser automation
  • file system access
  • paper-writing ability

The fabric assumes the active tool-rich session can do the whole chain end to end:

  • read sources
  • inspect prior failures
  • compile a workload
  • execute the workload
  • validate outputs
  • run meta-review
  • write the paper draft

Sub-agents remain optional accelerators, not required architecture.

---

The Two Shared Subsystems

A. Research Workflow Layer

This layer is responsible for:

  • reading prompt history and prior experiment logs
  • recovering prior hypotheses and failed paths
  • comparing possible experiment directions
  • turning a source paper or idea into a testable contract
  • defining what data is needed
  • defining what result would support or falsify the hypothesis
  • carrying the result into paper and blog synthesis

This is where Evoflow-style divergence belongs.
This is also where meta-review belongs.

B. Execution Workflow Layer

This layer is responsible for:

  • choosing compute substrate
  • compiling exact setup and run commands
  • defining monitor and recovery behavior
  • tracking expected artifacts and success markers
  • incorporating failure patterns from prompt logs
  • retrying only when verification fails
  • preserving resumability across instance death, process death, and package drift

This is where Vast.ai belongs.

---

Why Vast.ai Is Only One Profile

Vast.ai should be modeled as an execution profile under the fabric, not the whole workflow.

Examples of execution profiles:

  • `vastai.generic`
  • `vastai.training`
  • `vastai.paper_bundle`
  • `mesh.parallel`
  • `local.prototype`
  • `local.benchmark`
  • `remote.inference`

The `vastai.paper_bundle` profile is what the N'Ko Paper 6 run used:

  • remote bootstrap
  • dependency pinning
  • extraction
  • training bundle
  • monitor + relaunch
  • artifact verification
  • results download

But the global system must also support:

  • reading a paper and generating a reproduction plan
  • collecting or transforming data first
  • evaluating against a baseline
  • drafting the hypothesis and results section
  • running meta-review before claiming anything

---

Shared Contracts

Every workload should compile into the following contracts.

1. Hypothesis Contract

  • experiment question
  • claim under test
  • expected directional outcome
  • falsifiers
  • baseline
  • metric set

2. Data Contract

  • source datasets
  • schemas and field assumptions
  • transforms
  • noise warnings
  • provenance
  • volume requirements

3. Execution Contract

  • substrate: local / mesh / Vast.ai
  • bootstrap commands
  • run commands
  • monitor interval
  • recovery rules
  • retry rules
  • artifact list
  • success markers

4. Evaluation Contract

  • metrics
  • held-out split policy
  • baselines
  • sanity checks
  • artifact verification
  • regression checks

5. Publication Contract

  • result summary
  • caveats
  • negative findings
  • paper section updates
  • blog / briefing outputs

---

Incident-Aware Operation

The execution rail must be informed by prior incident logs.

Examples already recovered from the Vast.ai sessions:

  • never destroy active instances before SSH verification and artifact merge
  • do not rewrite scripts that are already producing output without a correctness reason
  • pin drifting dependencies
  • validate schema assumptions before long runs
  • assert feature flags are actually wired into runtime behavior
  • treat process death separately from instance death
  • verify artifacts before counting a run as complete
  • keep monitors portable across macOS and Linux

These incidents are not ASR-specific.
They are execution intelligence.

They belong in the shared fabric.

---

Relationship to Existing Systems

Evoflow

Evoflow belongs in the Research Workflow Layer.
It is a divergence and synthesis engine for shaping experiments before execution.

Meta-Review

Meta-review belongs after evaluation and before publication.
Its job is to attack assumptions, controls, methodology, missing tests, and overclaims.

Orbit / Context Recovery

Orbit belongs in the source-recovery stage.
It provides prior experiments, prompt logs, plans, and session context.

Vast.ai Workflow

Vast.ai belongs in the execution profile layer as one deterministic remote substrate.

Paper Pipeline

The paper pipeline belongs in the publication contract.
It should consume the verified experiment outputs, not raw optimistic notes.

---

Architectural Decision

The shared home for this architecture should be Comp-Core, not `cog-rlm`.

Reason:

  • `Comp-Core` is the system-level repository for shared agent and orchestration architecture.
  • `cog-rlm` can consume this architecture, but should not define it.
  • Claude Code, Codex, Gemini, and future orchestrators need a neutral home that is not tied to one product.

`cog-rlm` is therefore a consumer implementation.
The global architecture lives here in `Comp-Core`.

---

Initial Shared Deliverables

The first shared implementation should include:

1. A workflow manifest describing the stages and contracts.
2. An incident registry extracted from prompt logs and failure docs.
3. A compiler that turns a prompt-level objective into a deterministic execution plan.
4. A profile system for substrates like Vast.ai.
5. A publication pipeline hook so execution can flow into hypothesis writeup, meta-review, and paper drafting.

---

Non-Goals

This architecture does not require:

  • full autonomous operation without a tool-rich session
  • one monolithic service for all execution substrates
  • ASR-only abstractions
  • paper writing without experiment verification

---

Immediate Consequence

Any implementation hidden inside one app repo should be treated as provisional until its logic is promoted into shared architecture and shared tooling.

That is the change made here:

  • `Comp-Core` owns the architecture
  • shared tooling owns the workflow manifest
  • app repos consume the compiled plans

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

Comp-Core/docs/architecture/26-RESEARCH_EXECUTION_FABRIC.md

Detected Structure

Method · Evaluation · Architecture