AGP/N'Ko Thunder Train Status
Thunder Train is active again for MLX-based distributed adapter training across Mac4 and Mac5. It applies directly to the Gemma/AGP corrective language layer, including LoRA adapter training and tensor/data parallel experiments.
Full Public Reader
AGP/N'Ko Thunder Train Status
Date: 2026-04-21
What Thunder Train Applies To
Thunder Train is active again for MLX-based distributed adapter training across Mac4 and Mac5. It applies directly to the Gemma/AGP corrective language layer, including LoRA adapter training and tensor/data parallel experiments.
It does not automatically apply to the current Paper 4 N'Ko ASR checkpoint because that checkpoint is a PyTorch Whisper-large-v3 trajectory model. To use Thunder Train for that layer, the ASR model would need an MLX training/inference port or a separate MLX-compatible acoustic adapter design.
Verified Cluster State
- Mac4: `[ip]`, MLX `0.31.1`, MLX-LM `0.31.2`
- Mac5: `[ip]`, MLX `0.31.1`, MLX-LM `0.31.2`
- Thunderbolt bridge:
- Mac4 -> Mac5: about `0.52ms`
- Mac5 -> Mac4: about `0.56ms`
- MLX distributed ring smoke:
- rank 0 reported `size=2`
- rank 1 reported `size=2`
- all-sum check passed on both ranks
Smoke Training Run
Dataset:
- Builder: `scripts/build_agp_nko_correction_chatml.py`
- Manifest: `data/agp-nko-corrections/manifest.json`
- Train rows: `16`
- Validation rows: `4`
- Source reports:
- `policy_smoke_http_fewshot_rust_gate`
- `synthetic_http_fewshot_rust_gate`
- `eval_results_base_lowcer_http_rust_gate`
Command shape:
./launch.sh \
--model mlx-community/gemma-4-e2b-4bit \
--train-data [home]/projects/thunder-train/data/agp-nko-corrections/train.jsonl \
--valid-data [home]/projects/thunder-train/data/agp-nko-corrections/valid.jsonl \
--strategy data \
--num-iters 4 \
--batch-size 1 \
--num-layers 4 \
--lora-rank 8 \
--learning-rate 1e-6 \
--adapter-path [home]/projects/thunder-train/runs/agp-nko-correction-smoke-adapter \
--log-every 1 \
--eval-every 2 \
--save-every 2 \
--max-seq-len 512Result:
- world size: `2`
- strategy: `data`
- LoRA: rank `8`, last `4` layers
- trainable params: `15,877,411`
- validation loss:
- step 2: `3.7663`
- step 4/final: `3.7460`
- adapter artifact:
- Mac4 rank-0 path: `[home-path]`
- mirrored local path: `[home-path]`
- size: about `61M`
Architecture Boundary
Use Thunder Train for:
- AGP/Gemma N'Ko correction adapter training
- data-parallel throughput scaling on both Macs
- tensor-parallel experiments when a model is too large for one 16 GB node
- future N'Ko language-prior adapters trained from larger ASR error corpora
Do not treat Thunder Train as:
- a shared Apple Neural Engine
- a PyTorch distributed trainer
- a way to make the existing Whisper trajectory ASR checkpoint automatically fit
- proof that CER improved on Paper 4
The current CER-improving bridge remains:
1. PyTorch ASR checkpoint emits N'Ko text and trajectory context.
2. MLX/Gemma AGP lane proposes language-prior corrections.
3. Rust `cc-agp-bridge` accepts or rejects under partition and admissibility constraints.
4. RAG++/Graph Kernel provenance makes each decision replayable.
Thunder Train strengthens step 2. It does not replace steps 1, 3, or 4.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
projects/thunder-train/docs/agp-nko-thunder-train-status.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture