Grand Diomande Research · Full HTML Reader

AGP/N'Ko + Vast Training Handoff

You are continuing the N'Ko ASR training and AGP corrective-adapter program. The goal is not just to run more jobs; the goal is to finish the matched experimental bundle cleanly enough that the papers can make defensible claims about N'Ko script advantage, trajectory conditioning, TAR, and TTT.

Language as Infrastructure technical note experiment writeup candidate score 32 .md

Full Public Reader

AGP/N'Ko + Vast Training Handoff

Date: 2026-04-21

Mission

You are continuing the N'Ko ASR training and AGP corrective-adapter program. The goal is not just to run more jobs; the goal is to finish the matched experimental bundle cleanly enough that the papers can make defensible claims about N'Ko script advantage, trajectory conditioning, TAR, and TTT.

Treat this as two coupled but separate systems:

1. Acoustic ASR training on Vast/A100: PyTorch/Whisper trajectory model family. This is where Paper 4 CER numbers come from.
2. AGP corrective language layer on Mac4/Mac5: MLX/Gemma LoRA adapter that proposes N'Ko text repairs after ASR decoding. Rust/Graph Kernel decides whether those repairs are admissible.

Do not merge those into one imagined model unless you explicitly port the ASR stack to MLX. Right now they are separate training/runtime stacks.

Current Verified Anchor

Canonical Paper 4 checkpoint:

text
[home]/Desktop/nko-brain-scanner/results/paper4_reproduction_35205256/

Known state:

  • dataset: `290,596` pairs
  • split: `232,476 / 29,060 / 29,060`
  • script: N'Ko
  • mode: trajectory
  • `use_trajectory=true`
  • `use_tar=false`
  • `use_ttt=false`
  • final/test CER: `20.57
  • best validation: about `0.635887`
  • best checkpoint epoch: `38`
  • early stop: epoch `46`

Important: the filename `train_vastai_tar_ttt.py` is broader than the actual winning mode. The verified 20.57

Relevant Local Artifacts

Training scripts / launchers:

text
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776430903/train_vastai_tar_ttt.py
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776430903/run_final_a100.sh
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776436244/run_final_a100_v2.sh

Canonical text source:

text
[home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776436244/data/corrected_pairs_290k.jsonl

This has `290,596` rows and fields like:

json
{"feat_id":"bam_train_000000","nko":"...","latin":"..."}

Current Paper 4 output bundle:

text
[home]/Desktop/nko-brain-scanner/results/paper4_reproduction_35205256/

Older TAR result, not same snapshot:

text
[home]/Desktop/cog-rlm/results/tar_297k_clean/checkpoints/nko_tar_297k/nko_tar/

Known result: N'Ko TAR around `29.95

Matched Vast/A100 Training Bundle Needed

The clean paper bundle should include, at minimum:

1. N'Ko baseline on the current `290,596` snapshot.
2. Latin baseline on the current `290,596` snapshot.
3. Latin trajectory on the current `290,596` snapshot.
4. N'Ko TAR on the current `290,596` snapshot.
5. N'Ko trajectory + TTT on the current `290,596` snapshot.

Optional but valuable:

6. Latin TAR.
7. Latin trajectory + TTT.
8. Multi-seed reruns for the top two contenders.
9. Same test split prediction dumps for AGP correction training.

TTT/TAR Context

The trainer has a TTT implementation:

  • `InPlaceTTTAdapter`
  • `--use-ttt`
  • `--ttt-chunk-size`
  • `--ttt-lr`

The trainer also has TAR-related pathways. Verify the exact flags in:

bash
rg -n "use-ttt|ttt-|use-tar|tar|trajectory|baseline|latin|target" \
  [home]/Desktop/nko-checkpoints-a100-mirror/emergency_1776430903/train_vastai_tar_ttt.py

Expected experimental meaning:

  • baseline: Whisper features -> CTC N'Ko or Latin output, no trajectory conditioning.
  • trajectory: current winning mode; acoustic hidden states are conditioned by anticipation/trajectory scalars before CTC.
  • TAR: regularization/alignment family. Verify implementation details in code before naming it in paper language.
  • TTT: test-time or chunk-local adaptation path. It should be evaluated as an ablation; do not assume improvement until measured.

AGP Architecture Boundary

The AGP architecture is not currently the acoustic model. It is the post-ASR correction layer:

text
audio
  -> Whisper/PyTorch trajectory ASR checkpoint
  -> raw N'Ko candidate + trajectory/partition metadata
  -> AGP/Gemma proposal model
  -> Rust cc-agp-bridge accept/reject gate
  -> admissibility token + RAG++/Graph provenance
  -> final N'Ko text

Current bridge repo:

text
[home]/Desktop/Comp-Core/experiments/agp_mlx/asr_bridge/
[home]/Desktop/Comp-Core/core/semantic/cc-agp-bridge/

Current AGP runtime:

  • Mac5 corrective lane: `http://[ip]:9442/health`
  • `/propose` exists and has been smoke-tested.
  • Rust gate has accepted safe boundary/uncertain repairs and blocked novelty overreach.

Known bridge metrics:

  • hand smoke: CER `0.142857 -> 0.047619`, accepted improved `2`, accepted worse `0`
  • synthetic stress: CER `0.133333 -> 0.1`, accepted improved `3`, accepted neutral `5`, accepted worse `0`
  • archived real eval low-CER slice: CER `0.7603686636 -> 0.7511520737`, accepted improved `1`, accepted worse `0`

Interpretation:

  • AGP bridge has architectural promise.
  • It is not yet a production CER claim on Paper 4.
  • The missing ingredient is same-provenance Paper 4 ASR prediction/reference rows.

Thunder Train State

Repo:

text
[home]/projects/thunder-train/

Cluster is working after patching:

  • Mac4: `[ip]`
  • Mac5: `[ip]`
  • MLX: `0.31.1` on both
  • MLX-LM: `0.31.2` on both
  • Thunderbolt latency: about `0.5ms`
  • MLX distributed ring smoke: `world_size=2`

Patches made:

  • `launch.sh` now checks TB reachability from Mac4/Mac5, not Mac1.
  • `thunder_status.py` now checks the project runtime and remote TB peer.
  • `thunder_python.sh` falls back to `.venv-agp` if `.venv` is absent.

AGP adapter data and scripts:

text
[home]/projects/thunder-train/scripts/build_agp_nko_correction_chatml.py
[home]/projects/thunder-train/scripts/build_nko_synthetic_asr_correction_chatml.py
[home]/projects/thunder-train/scripts/eval_agp_nko_adapter.py
[home]/projects/thunder-train/data/agp-nko-corrections/
[home]/projects/thunder-train/data/agp-nko-synthetic-2k/

Completed Thunder Train runs:

  • `runs/agp-nko-correction-smoke-adapter`: 4-step infrastructure smoke, saved adapter.
  • `runs/agp-nko-synthetic-2k-r2-safe`: 25-step safe run, saved adapter.

Failed/unstable shape:

- 100-step, batch `2`, LoRA layers `8`, max seq `512` hit Metal GPU timeout.

Safe shape:

bash
./launch.sh \
  --model mlx-community/gemma-4-e2b-4bit \
  --strategy data \
  --batch-size 1 \
  --num-layers 4 \
  --lora-rank 8 \
  --learning-rate 1e-6 \
  --max-seq-len 256

Quality caveat:

  • Early adapter outputs are not production-ready.
  • The adapter lane currently proves distributed training and artifact generation.
  • Do not claim AGP LoRA improved Paper 4 CER yet.

What To Hand Back To AGP From Vast

For every completed ASR run, preserve:

text
results.json
train.log
run.log
split.json
vocab.json
best.pt
test_predictions.jsonl
test_references.jsonl
test_metrics_by_partition.json

If prediction dumps do not already exist, add them. AGP needs row-level records:

json
{
  "feat_id": "bam_train_...",
  "audio_id": "...",
  "split": "test",
  "script": "nko",
  "mode": "trajectory",
  "asr_text": "...",
  "reference_text": "...",
  "cer_edits": 1,
  "reference_chars": 20,
  "trajectory_scalars": {...},
  "partition": "stable|boundary|uncertain|novelty"
}

These rows become the real training corpus for the AGP corrective adapter:

text
ASR text + partition + sigils + trajectory metadata -> corrected N'Ko text

Synthetic perturbation data is only warmup. Real ASR error pairs matter more.

External Research Targets

Do external research before writing claims, especially around terms and novelty:

1. Whisper + CTC adaptation
- Search for Whisper encoder frozen-feature CTC ASR, especially low-resource scripts.
- Determine whether using Whisper-large-v3 features with a custom CTC head is standard enough to frame as adaptation rather than a new acoustic foundation model.

2. Test-Time Training for ASR
- Search for test-time training/adaptation in speech recognition and self-supervised ASR.
- Clarify whether our `InPlaceTTTAdapter` is closer to TTT, test-time adaptation, or chunk-local adaptation.
- Do not overclaim if the implementation is only a lightweight adapter.

3. TAR naming
- Verify what TAR means in our code and how it relates to published training regularizers.
- If the acronym is internal, define it internally in the paper rather than implying a standard method.

4. N'Ko / low-resource script ASR
- Search for N'Ko OCR/ASR/digital text corpora.
- Look for Manding/Bambara ASR papers, especially script conversion, Latin transliteration, and low-resource orthography.
- This helps position the N'Ko vs Latin comparison.

5. MLX distributed training
- Check current MLX distributed docs for ring backend, data parallelism, tensor sharding, and Apple Silicon constraints.
- Keep Thunder Train claims precise: data parallel increases throughput/effective batch, tensor parallel can shard model weights, neither is a shared ANE.

6. Apple Neural Engine
- Verify current Core ML / ANE training vs inference support.
- Current assumption: ANE is an inference deployment target after conversion, not a training accelerator for our PyTorch/MLX loops.

Vast/A100 Execution Notes

Before launching paid compute:

bash
vastai show instances
vastai search offers 'gpu_name=A100 inet_down>500 disk_space>500 reliability>0.98'

No instance should be left running after jobs complete. Destroy stopped/exited instances too if storage billing continues.

Preferred A100 over RTX 4090 for the canonical bundle because prior scripts assume A100-class memory/disk stability.

Budget orientation:

  • `$100` is enough for a serious matched rerun bundle if A100 price is around `$0.56-$0.67/hr`.
  • Always verify current prices with the Vast CLI; do not rely on old quote memory.

Minimum Success Criteria

The training lane is complete only when:

1. All canonical modes run on the same `290,596` snapshot.
2. Each run has immutable local and remote artifacts.
3. Each run emits row-level predictions and references.
4. Tables can be regenerated without manual copy/paste.
5. AGP can consume the prediction/reference rows for correction-adapter training.
6. Any claim about TAR/TTT is backed by same-snapshot measured CER.

Suggested First Actions

1. Verify `corrected_pairs_290k.jsonl` and feature availability on the target Vast image.
2. Confirm CLI flags for baseline, trajectory, TAR, and TTT from `train_vastai_tar_ttt.py`.
3. Write a run matrix JSON before launching.
4. Run one cheap sanity epoch or small subset on A100.
5. Launch the full matched bundle only after artifact upload/checkpoint sync is verified.
6. Add prediction-dump code before long runs, not after.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

projects/thunder-train/docs/agp-nko-vast-training-handoff.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture