Back to corpus
proposalexperiment writeup candidatescore 24

AGP Thunder-Train Stage 1 Plan

This is the stage-1 backbone plan for running AGP domain adaptation across `Mac4 + Mac5` over `Thunderbolt 5` using the existing `thunder-train` stack. The purpose of this stage is not to train the full AGP architecture end to end. The purpose is to make both Macs compute immediately on the first useful backbone problem: `Gemma 4 E2B` domain adaptation on the AGP high-signal corpus.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

This is the stage-1 backbone plan for running AGP domain adaptation across `Mac4 + Mac5` over `Thunderbolt 5` using the existing `thunder-train` stack. The purpose of this stage is not to train the full AGP architecture end to end. The purpose is to make both Macs compute immediately on the first useful backbone problem: `Gemma 4 E2B` domain adaptation on the AGP high-signal corpus. The core decision is now explicit. `Mac4 + Mac5 should both compute.` The prior single-host MLX path remains important, but only as the baseline and fallback control. The primary execution path for stage 1 is dual-host Thunder-Train over the `10.0.5.x` Thunderbolt link. First, the real Thunder entrypoint is `launch.sh`, not `distributed_launch.sh`. The repo marks `distributed_launch.sh` as legacy and the current launcher path is `launch.sh` + `mlx_launch.py`. Second, Thunder-Train expects ChatML-style `{"messages": [...]}` records. The current AGP MLX lane already proved out on plain `text` exports for `mlx_lm lora`, so the AGP data must be re-exported into Thunder format instead of pretending the same files can be reused unchanged. Third, the remote runtime assumption is currently broken. The Thunder launcher expects a consistent Python interpreter path on both Macs with `mlx` and `mlx_lm` installed. At status check time, neither host satisfied that assumption cleanly. Until that parity is fixed, no distributed launch should be called “live.”

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.