Grand Diomande Research · Full HTML Reader

AGP/N'Ko Thunder Train Status

Thunder Train is active again for MLX-based distributed adapter training across Mac4 and Mac5. It applies directly to the Gemma/AGP corrective language layer, including LoRA adapter training and tensor/data parallel experiments.

Language as Infrastructure proposal experiment writeup candidate score 24 .md

Full Public Reader

AGP/N'Ko Thunder Train Status

Date: 2026-04-21

What Thunder Train Applies To

Thunder Train is active again for MLX-based distributed adapter training across Mac4 and Mac5. It applies directly to the Gemma/AGP corrective language layer, including LoRA adapter training and tensor/data parallel experiments.

It does not automatically apply to the current Paper 4 N'Ko ASR checkpoint because that checkpoint is a PyTorch Whisper-large-v3 trajectory model. To use Thunder Train for that layer, the ASR model would need an MLX training/inference port or a separate MLX-compatible acoustic adapter design.

Verified Cluster State

  • Mac4: `[ip]`, MLX `0.31.1`, MLX-LM `0.31.2`
  • Mac5: `[ip]`, MLX `0.31.1`, MLX-LM `0.31.2`
  • Thunderbolt bridge:
  • Mac4 -> Mac5: about `0.52ms`
  • Mac5 -> Mac4: about `0.56ms`
  • MLX distributed ring smoke:
  • rank 0 reported `size=2`
  • rank 1 reported `size=2`
  • all-sum check passed on both ranks

Smoke Training Run

Dataset:

  • Builder: `scripts/build_agp_nko_correction_chatml.py`
  • Manifest: `data/agp-nko-corrections/manifest.json`
  • Train rows: `16`
  • Validation rows: `4`
  • Source reports:
  • `policy_smoke_http_fewshot_rust_gate`
  • `synthetic_http_fewshot_rust_gate`
  • `eval_results_base_lowcer_http_rust_gate`

Command shape:

bash
./launch.sh \
  --model mlx-community/gemma-4-e2b-4bit \
  --train-data [home]/projects/thunder-train/data/agp-nko-corrections/train.jsonl \
  --valid-data [home]/projects/thunder-train/data/agp-nko-corrections/valid.jsonl \
  --strategy data \
  --num-iters 4 \
  --batch-size 1 \
  --num-layers 4 \
  --lora-rank 8 \
  --learning-rate 1e-6 \
  --adapter-path [home]/projects/thunder-train/runs/agp-nko-correction-smoke-adapter \
  --log-every 1 \
  --eval-every 2 \
  --save-every 2 \
  --max-seq-len 512

Result:

  • world size: `2`
  • strategy: `data`
  • LoRA: rank `8`, last `4` layers
  • trainable params: `15,877,411`
  • validation loss:
  • step 2: `3.7663`
  • step 4/final: `3.7460`
  • adapter artifact:
  • Mac4 rank-0 path: `[home-path]`
  • mirrored local path: `[home-path]`
  • size: about `61M`

Architecture Boundary

Use Thunder Train for:

  • AGP/Gemma N'Ko correction adapter training
  • data-parallel throughput scaling on both Macs
  • tensor-parallel experiments when a model is too large for one 16 GB node
  • future N'Ko language-prior adapters trained from larger ASR error corpora

Do not treat Thunder Train as:

  • a shared Apple Neural Engine
  • a PyTorch distributed trainer
  • a way to make the existing Whisper trajectory ASR checkpoint automatically fit
  • proof that CER improved on Paper 4

The current CER-improving bridge remains:

1. PyTorch ASR checkpoint emits N'Ko text and trajectory context.
2. MLX/Gemma AGP lane proposes language-prior corrections.
3. Rust `cc-agp-bridge` accepts or rejects under partition and admissibility constraints.
4. RAG++/Graph Kernel provenance makes each decision replayable.

Thunder Train strengthens step 2. It does not replace steps 1, 3, or 4.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

projects/thunder-train/docs/agp-nko-thunder-train-status.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture