AGP/N'Ko + Vast Training Handoff
You are continuing the N'Ko ASR training and AGP corrective-adapter program. The goal is not just to run more jobs; the goal is to finish the matched experimental bundle cleanly enough that the papers can make defensible claims about N'Ko script advantage, trajectory conditioning, TAR, and TTT.
Full Public Reader
AGP/N'Ko + Vast Training Handoff
Date: 2026-04-21
Mission
You are continuing the N'Ko ASR training and AGP corrective-adapter program. The goal is not just to run more jobs; the goal is to finish the matched experimental bundle cleanly enough that the papers can make defensible claims about N'Ko script advantage, trajectory conditioning, TAR, and TTT.
Treat this as two coupled but separate systems:
1. Acoustic ASR training on Vast/A100: PyTorch/Whisper trajectory model family. This is where Paper 4 CER numbers come from.
2. AGP corrective language layer on Mac4/Mac5: MLX/Gemma LoRA adapter that proposes N'Ko text repairs after ASR decoding. Rust/Graph Kernel decides whether those repairs are admissible.
Do not merge those into one imagined model unless you explicitly port the ASR stack to MLX. Right now they are separate training/runtime stacks.
Current Verified Anchor
Canonical Paper 4 checkpoint:
[home]/Desktop/nko-brain-scanner/results/paper4_reproduction_35205256/Known state:
- dataset: `290,596` pairs
- split: `232,476 / 29,060 / 29,060`
- script: N'Ko
- mode: trajectory
- `use_trajectory=true`
- `use_tar=false`
- `use_ttt=false`
- final/test CER: `20.57
- best validation: about `0.635887`
- best checkpoint epoch: `38`
- early stop: epoch `46`
Important: the filename `train_vastai_tar_ttt.py` is broader than the actual winning mode. The verified 20.57
Relevant Local Artifacts
Training scripts / launchers:
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776430903/train_vastai_tar_ttt.py
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776430903/run_final_a100.sh
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776436244/run_final_a100_v2.shCanonical text source:
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776436244/data/corrected_pairs_290k.jsonlThis has `290,596` rows and fields like:
{"feat_id":"bam_train_000000","nko":"...","latin":"..."}Current Paper 4 output bundle:
[home]/Desktop/nko-brain-scanner/results/paper4_reproduction_35205256/Older TAR result, not same snapshot:
[home]/Desktop/cog-rlm/results/tar_297k_clean/checkpoints/nko_tar_297k/nko_tar/Known result: N'Ko TAR around `29.95
Matched Vast/A100 Training Bundle Needed
The clean paper bundle should include, at minimum:
1. N'Ko baseline on the current `290,596` snapshot.
2. Latin baseline on the current `290,596` snapshot.
3. Latin trajectory on the current `290,596` snapshot.
4. N'Ko TAR on the current `290,596` snapshot.
5. N'Ko trajectory + TTT on the current `290,596` snapshot.
Optional but valuable:
6. Latin TAR.
7. Latin trajectory + TTT.
8. Multi-seed reruns for the top two contenders.
9. Same test split prediction dumps for AGP correction training.
TTT/TAR Context
The trainer has a TTT implementation:
- `InPlaceTTTAdapter`
- `--use-ttt`
- `--ttt-chunk-size`
- `--ttt-lr`
The trainer also has TAR-related pathways. Verify the exact flags in:
rg -n "use-ttt|ttt-|use-tar|tar|trajectory|baseline|latin|target" \
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776430903/train_vastai_tar_ttt.pyExpected experimental meaning:
- baseline: Whisper features -> CTC N'Ko or Latin output, no trajectory conditioning.
- trajectory: current winning mode; acoustic hidden states are conditioned by anticipation/trajectory scalars before CTC.
- TAR: regularization/alignment family. Verify implementation details in code before naming it in paper language.
- TTT: test-time or chunk-local adaptation path. It should be evaluated as an ablation; do not assume improvement until measured.
AGP Architecture Boundary
The AGP architecture is not currently the acoustic model. It is the post-ASR correction layer:
audio
-> Whisper/PyTorch trajectory ASR checkpoint
-> raw N'Ko candidate + trajectory/partition metadata
-> AGP/Gemma proposal model
-> Rust cc-agp-bridge accept/reject gate
-> admissibility token + RAG++/Graph provenance
-> final N'Ko textCurrent bridge repo:
[home]/Desktop/Comp-Core/experiments/agp_mlx/asr_bridge/
[home]/Desktop/Comp-Core/core/semantic/cc-agp-bridge/Current AGP runtime:
- Mac5 corrective lane: `http://[ip]:9442/health`
- `/propose` exists and has been smoke-tested.
- Rust gate has accepted safe boundary/uncertain repairs and blocked novelty overreach.
Known bridge metrics:
- hand smoke: CER `0.142857 -> 0.047619`, accepted improved `2`, accepted worse `0`
- synthetic stress: CER `0.133333 -> 0.1`, accepted improved `3`, accepted neutral `5`, accepted worse `0`
- archived real eval low-CER slice: CER `0.7603686636 -> 0.7511520737`, accepted improved `1`, accepted worse `0`
Interpretation:
- AGP bridge has architectural promise.
- It is not yet a production CER claim on Paper 4.
- The missing ingredient is same-provenance Paper 4 ASR prediction/reference rows.
Thunder Train State
Repo:
[home]/projects/thunder-train/Cluster is working after patching:
- Mac4: `[ip]`
- Mac5: `[ip]`
- MLX: `0.31.1` on both
- MLX-LM: `0.31.2` on both
- Thunderbolt latency: about `0.5ms`
- MLX distributed ring smoke: `world_size=2`
Patches made:
- `launch.sh` now checks TB reachability from Mac4/Mac5, not Mac1.
- `thunder_status.py` now checks the project runtime and remote TB peer.
- `thunder_python.sh` falls back to `.venv-agp` if `.venv` is absent.
AGP adapter data and scripts:
[home]/projects/thunder-train/scripts/build_agp_nko_correction_chatml.py
[home]/projects/thunder-train/scripts/build_nko_synthetic_asr_correction_chatml.py
[home]/projects/thunder-train/scripts/eval_agp_nko_adapter.py
[home]/projects/thunder-train/data/agp-nko-corrections/
[home]/projects/thunder-train/data/agp-nko-synthetic-2k/Completed Thunder Train runs:
- `runs/agp-nko-correction-smoke-adapter`: 4-step infrastructure smoke, saved adapter.
- `runs/agp-nko-synthetic-2k-r2-safe`: 25-step safe run, saved adapter.
Failed/unstable shape:
- 100-step, batch `2`, LoRA layers `8`, max seq `512` hit Metal GPU timeout.
Safe shape:
./launch.sh \
--model mlx-community/gemma-4-e2b-4bit \
--strategy data \
--batch-size 1 \
--num-layers 4 \
--lora-rank 8 \
--learning-rate 1e-6 \
--max-seq-len 256Quality caveat:
- Early adapter outputs are not production-ready.
- The adapter lane currently proves distributed training and artifact generation.
- Do not claim AGP LoRA improved Paper 4 CER yet.
What To Hand Back To AGP From Vast
For every completed ASR run, preserve:
results.json
train.log
run.log
split.json
vocab.json
best.pt
test_predictions.jsonl
test_references.jsonl
test_metrics_by_partition.jsonIf prediction dumps do not already exist, add them. AGP needs row-level records:
{
"feat_id": "bam_train_...",
"audio_id": "...",
"split": "test",
"script": "nko",
"mode": "trajectory",
"asr_text": "...",
"reference_text": "...",
"cer_edits": 1,
"reference_chars": 20,
"trajectory_scalars": {...},
"partition": "stable|boundary|uncertain|novelty"
}These rows become the real training corpus for the AGP corrective adapter:
ASR text + partition + sigils + trajectory metadata -> corrected N'Ko textSynthetic perturbation data is only warmup. Real ASR error pairs matter more.
External Research Targets
Do external research before writing claims, especially around terms and novelty:
1. Whisper + CTC adaptation
- Search for Whisper encoder frozen-feature CTC ASR, especially low-resource scripts.
- Determine whether using Whisper-large-v3 features with a custom CTC head is standard enough to frame as adaptation rather than a new acoustic foundation model.
2. Test-Time Training for ASR
- Search for test-time training/adaptation in speech recognition and self-supervised ASR.
- Clarify whether our `InPlaceTTTAdapter` is closer to TTT, test-time adaptation, or chunk-local adaptation.
- Do not overclaim if the implementation is only a lightweight adapter.
3. TAR naming
- Verify what TAR means in our code and how it relates to published training regularizers.
- If the acronym is internal, define it internally in the paper rather than implying a standard method.
4. N'Ko / low-resource script ASR
- Search for N'Ko OCR/ASR/digital text corpora.
- Look for Manding/Bambara ASR papers, especially script conversion, Latin transliteration, and low-resource orthography.
- This helps position the N'Ko vs Latin comparison.
5. MLX distributed training
- Check current MLX distributed docs for ring backend, data parallelism, tensor sharding, and Apple Silicon constraints.
- Keep Thunder Train claims precise: data parallel increases throughput/effective batch, tensor parallel can shard model weights, neither is a shared ANE.
6. Apple Neural Engine
- Verify current Core ML / ANE training vs inference support.
- Current assumption: ANE is an inference deployment target after conversion, not a training accelerator for our PyTorch/MLX loops.
Vast/A100 Execution Notes
Before launching paid compute:
vastai show instances
vastai search offers 'gpu_name=A100 inet_down>500 disk_space>500 reliability>0.98'No instance should be left running after jobs complete. Destroy stopped/exited instances too if storage billing continues.
Preferred A100 over RTX 4090 for the canonical bundle because prior scripts assume A100-class memory/disk stability.
Budget orientation:
- `$100` is enough for a serious matched rerun bundle if A100 price is around `$0.56-$0.67/hr`.
- Always verify current prices with the Vast CLI; do not rely on old quote memory.
Minimum Success Criteria
The training lane is complete only when:
1. All canonical modes run on the same `290,596` snapshot.
2. Each run has immutable local and remote artifacts.
3. Each run emits row-level predictions and references.
4. Tables can be regenerated without manual copy/paste.
5. AGP can consume the prediction/reference rows for correction-adapter training.
6. Any claim about TAR/TTT is backed by same-snapshot measured CER.
Suggested First Actions
1. Verify `corrected_pairs_290k.jsonl` and feature availability on the target Vast image.
2. Confirm CLI flags for baseline, trajectory, TAR, and TTT from `train_vastai_tar_ttt.py`.
3. Write a run matrix JSON before launching.
4. Run one cheap sanity epoch or small subset on A100.
5. Launch the full matched bundle only after artifact upload/checkpoint sync is verified.
6. Add prediction-dump code before long runs, not after.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-brain-scanner/docs/handoffs/agp-nko-vast-training-handoff.md
Detected Structure
Method · Evaluation · References · Code Anchors · Architecture