AGP Execution Roadmap V1
- `Gemma 4 E2B` Thunder stage-one LoRA backbone - full train and held-out route oracle artifacts - route/vitality head `v1` on original conservative labels - calibrated threshold sweep over saved oracle metrics - recalibrated route/vitality head `v2` - three-head controller with earliest-layer supervision - corrected `transfer_v2` same-host adapter run
Full Public Reader
AGP Execution Roadmap V1
Date: `2026-04-17`
Current Status
The AGP program is through the first control-stage milestone.
Completed:
- `Gemma 4 E2B` Thunder stage-one LoRA backbone
- full train and held-out route oracle artifacts
- route/vitality head `v1` on original conservative labels
- calibrated threshold sweep over saved oracle metrics
- recalibrated route/vitality head `v2`
- three-head controller with earliest-layer supervision
- corrected `transfer_v2` same-host adapter run
Current best control baseline:
- calibrated label regime: `kl=4.0`, `margin_delta=0.15`
- held-out route accuracy: `0.8444`
- held-out vitality accuracy: `1.0`
- exact held-out separation for:
- `accept_local`
- `revive_local`
- `escalate`
- remaining route confusion is mainly:
- `continue_local -> accept_local`
This means the backbone hidden states are already useful enough to support a learned controller. The remaining work is to convert that controller into a real distributed latent-transfer system.
Current transfer baseline:
- corrected dataset: `transfer_v2_k4_m015`
- best run: `transfer_adapter_v2_20260418_000114`
- best checkpoint: step `1400`
- held-out overall:
- cosine `0.9226`
- MSE `0.1097`
- held-out by route:
- `escalate`: cosine `0.9310`, MSE `0.0792`
- `continue_local`: cosine `0.9432`, MSE `0.0979`
- `accept_local`: cosine `0.6189`, MSE `0.6541`
- held-out by layer:
- layer `30`: cosine `0.9360`, MSE `0.0868`
- layer `26`: cosine `0.6189`, MSE `0.6541`
Interpretation:
- late-boundary transfer is already strong enough to justify the next routed-resume stage
- true early-layer transfer is still weak
- the next phase should not pretend these are the same problem
Promoted same-host transfer baseline:
- canonical transfer adapter:
- `transfer_adapter_logit_v2_20260418_011824`
- why it wins:
- hidden-only transfer looked acceptable in latent space but failed badly in live next-token behavior
- logit-aware transfer fixed the real runtime bottleneck
- live prompt-loop comparison on the same `8` held-out prompts:
- route mix unchanged:
- `7 continue_local`
- `1 escalate`
- hidden-only transfer:
- mean live KL `5.5070`
- top-1 teacher match `0.0`
- logit-aware transfer:
- mean live KL `0.6490`
- top-1 teacher match `0.8571`
- interpretation:
- same-host `continue_local` is now operational not only in latent similarity, but in next-token behavior
- local transfer is no longer the blocker
Packetized same-host baseline:
- tools:
- `runtime/agp_packet.py`
- `runtime/run_mock_packet_replay_v1.py`
- same `8` held-out prompts / `7` continue-local cases:
- `fp16` request packets:
- mean bytes `3832`
- mean KL `0.6497`
- top-1 match `0.8571`
- `q8_0` request packets:
- mean bytes `1870.3`
- mean KL `0.7365`
- top-1 match `0.8571`
- interpretation:
- request-side packet compression is already viable
- `q8_0` roughly halves request size without materially harming the promoted local continuation path
First cross-host replay baseline:
- tools:
- `runtime/agp_packet_resume_server_v1.py`
- `runtime/run_cross_host_packet_replay_v1.py`
- remote host:
- `mac5`
- request codec:
- `q8_0`
- same `8` held-out prompts / `7` continue-local cases:
- mean request bytes `1820`
- mean response bytes `3608.9`
- mean KL `0.7362`
- top-1 match `0.8571`
- mean margin delta `0.0963`
- median server timings from row report:
- decode `~1.14ms`
- infer `~6.42ms`
- encode `~0.57ms`
- median network roundtrip `~1207ms`
- interpretation:
- cross-host latent transport is now experimentally real
- remote resume quality survives the network hop
- the research bottleneck has moved from resume quality to transport path latency
First artifact-driven routed-resume result:
- report:
- `Desktop/Comp-Core/experiments/agp_mlx/runtime/reports/routed_resume_artifact_v1/routed_resume_artifact_report.json`
- on the `23` transfer-eligible held-out records:
- controller predicted:
- `12 escalate`
- `6 continue_local`
- `5 accept_local`
- route accuracy on this subset: `0.8261`
- boundary accuracy on this subset: `0.3043`
- vitality accuracy on this subset: `0.9565`
- semantically separated result:
- `continue_local` resume path:
- `9` oracle continue cases
- `6` controller-compatible continue cases
- resumed hidden-state quality on those compatible cases:
- cosine `0.9552`
- MSE `0.0733`
- interpretation:
- the continue-local path is strong enough to operationalize
- the next failure to attack is boundary drift (`30 -> 26`) and route spill from `continue_local -> accept_local`
First true same-host runtime policy:
- runtime:
- `Desktop/Comp-Core/experiments/agp_mlx/runtime/run_same_host_routed_runtime_v1.py`
- confidence sweep:
- `Desktop/Comp-Core/experiments/agp_mlx/runtime/sweep_route_confidence_policy_v1.py`
- recommended local policy for this wave:
- boundary = route-derived
- `accept_local -> 26`
- `continue_local -> 30`
- otherwise `none`
- local-action confidence gate = `0.55`
- held-out result under that policy:
- action counts:
- `13 escalate`
- `7 continue_local`
- `3 accept_local`
- route accuracy: `0.9130`
- continue-local matched quality:
- cosine `0.9519`
- MSE `0.0747`
- interpretation:
- the local runtime can already support a trustworthy continue-local path
- late-boundary resume is no longer the research bottleneck
- next failure to attack is false local accept spill
Remaining Program
Stage 2.1 — Control Consolidation
Objective:
Turn the current calibrated route/vitality head into a trustworthy baseline controller.
Tasks:
- [ ] freeze the current calibrated checkpoint as the control baseline
- [ ] export per-class confusion, precision, recall, and macro-F1 for the held-out split
- [ ] add a direct `earliest acceptable layer` prediction head
- [ ] compare:
- route-only prediction
- route + vitality multitask
- route + vitality + earliest-layer multitask
- [ ] decide whether `accept_local` and `continue_local` should remain separate actions or become a shared accept state with boundary refinement downstream
Validation:
- [ ] hold route macro-F1 above the majority baseline by a wide margin
- [ ] preserve exact or near-exact `escalate` detection
- [ ] preserve exact or near-exact `revive_local` detection
- [ ] reduce `continue_local -> accept_local` error if possible
Exit criteria:
- [ ] controller is stable enough to supervise transfer experiments
Stage 2.2 — Confidence Calibration
Objective:
Replace raw logits with usable operating confidence.
Tasks:
- [ ] add confidence extraction for route and vitality heads
- [ ] measure expected calibration error on held-out data
- [ ] tune thresholds for:
- accept
- continue
- revive
- escalate
- [ ] build an abstain policy for low-confidence cases
Validation:
- [ ] confidence correlates with correctness
- [ ] abstain region improves safety without collapsing acceptance rate
Exit criteria:
- [ ] controller can expose a reliable decision + confidence packet to downstream transfer logic
Stage 3.0 — Transfer Dataset Construction
Objective:
Create the supervision substrate for hidden-state transfer.
Tasks:
- [ ] derive transfer supervision splits from calibrated oracle records
- [ ] group records by:
- `accept_local`
- `continue_local`
- `escalate`
- [ ] materialize source latent tensors for candidate split layers
- [ ] materialize target continuation references for resumed decoding
- [ ] define the first compact transfer target:
- hidden-state reconstruction
- resumed-logit fidelity
- continuation agreement
Validation:
- [ ] dataset covers both layer `26` and layer `30` dominated paths
- [ ] train/valid split preserves route-class proportions
Exit criteria:
- [ ] `transfer_v1` dataset exists and is reproducible
Stage 3.1 — Same-Host Transfer Adapter
Objective:
Teach a compact latent-transfer module before involving network transport.
Tasks:
- [ ] build a same-host transfer encoder/decoder in MLX
- [ ] train on calibrated transfer supervision
- [ ] evaluate:
- hidden-state cosine
- hidden-state MSE
- resumed logit KL
- continuation agreement
- [ ] sweep packet bottleneck widths
Validation:
- [x] same-host hidden-state reconstruction is strong on the dominant late-boundary path
- [x] same-host resumed continuation tracks no-transfer baseline closely enough to be useful
- [x] packet bottleneck is materially smaller than raw hidden-state transfer
- [ ] early-layer transfer improves beyond the current layer `26` weakness
Exit criteria:
- [x] one transfer configuration is good enough to test in the local routed-resume loop
Stage 3.2 — Same-Host Routed Inference
Objective:
Close the loop locally before crossing machines.
Tasks:
- [ ] build a local inference harness that:
- runs the controller
- chooses route action
- if accepted, exits locally
- if continue, resumes from calibrated late layer
- if escalate, runs the deeper correction path
- [ ] compare routed local inference versus full-depth baseline
Validation:
- [ ] routed local inference reduces average active depth
- [ ] quality stays within acceptable bounds
- [ ] failure cases are clearly attributable
Exit criteria:
- [x] local AGP loop works on one machine
Immediate subplan for Stage 3.2:
- [ ] build a routed-resume harness that loads:
- the Thunder stage-one Gemma adapter
- the best three-head controller
- the best corrected transfer adapter
- [ ] run held-out prompts through:
- full baseline
- continue-local resume path
- early-accept resume path
- forced escalate path
- [ ] report:
- resumed logit KL
- continuation agreement
- output-level acceptance / mismatch counts
- per-layer failure breakdown
- [x] establish the first artifact-backed local runtime policy
- [x] turn the runtime from artifact replay into a live same-host prompt loop
- [x] reduce false `accept_local` spill under the live local policy
- [ ] if layer `26` remains weak, spin a focused transfer-improvement pass before any cross-host runtime work
Stage 4.0 — Cross-Host Resume Over Thunderbolt
Objective:
Turn same-host transfer into real `Mac4 -> Mac5` continuation.
Tasks:
- [x] define the first production-shaped `AGP-PTP` packet
- [x] send compressed latent packets to a remote host
- [x] reconstruct on the receiving host
- [x] resume the promoted transfer path on the remote host
- [x] compare cross-host continuation to same-host packetized continuation
- [ ] optimize the transport path itself
- [ ] move from hidden-return resume to deeper corrective continuation
Validation:
- [ ] packet transport overhead is lower than rerunning the entire path
- [x] resumed continuation fidelity stays close to same-host transfer
Exit criteria:
- [x] first real cross-host AGP continuation demo exists
Updated immediate subplan for Stage 4.0:
- [x] define the first latent packet around the promoted logit-aware transfer adapter
- [x] instrument bytes per packet, encode/decode latency, and boundary metadata
- [x] replay the same held-out local runtime cases across a mock packet boundary first
- [x] move the packet across the real `mac1 -> mac5` path
- [ ] rerun over the intended Thunderbolt-first/direct path with lower network RTT
- [ ] add connection reuse, batching, or response compression if the path remains the limiter
Stage 4.1 — Two-Host Routed Runtime
Objective:
Make `Mac4` the reflex path and `Mac5` the corrective path.
Tasks:
- [ ] integrate controller + transfer adapter + cross-host resume into one runtime loop
- [ ] add telemetry for:
- route counts
- bytes transferred
- resumed latency
- accepted vs escalated outputs
- [ ] run the held-out prompt pack end to end
Validation:
- [ ] median latency improves on easy cases or compute drops materially
- [ ] hard-case quality remains competitive with full-depth baseline
Exit criteria:
- [ ] two-host routed runtime is operational
Stage 5.0 — Semantic Projection Head
Objective:
Attach the typed semantic layer only after the control path is operational.
Tasks:
- [ ] build primitive/invariant supervision from the semantic kernel
- [ ] train the first semantic projection head
- [ ] test whether semantic confidence improves:
- dead-state detection
- route confidence
- transfer acceptance
Validation:
- [ ] semantic layer is operationally useful, not just interpretable
Exit criteria:
- [ ] semantic packet becomes part of the AGP control plane
Stage 6.0 — ANE Sidecar
Objective:
Move the cheap, frequent, shallow modules onto the Apple engine hierarchy.
Tasks:
- [ ] identify exportable shallow modules:
- route head
- vitality head
- semantic head
- compact transfer encoder
- [ ] benchmark MLX/GPU versus Core ML/ANE paths
- [ ] test whether ANE offload reduces GPU contention or energy per token
Validation:
- [ ] ANE sidecar improves energy or frees useful GPU time
- [ ] no major quality or latency regression from export path
Exit criteria:
- [ ] Apple-engine partition is proven with a real module, not just theory
Long-Running Loop
The remaining work should run in this order:
1. `control consolidation`
2. `confidence calibration`
3. `transfer dataset construction`
4. `same-host transfer adapter`
5. `same-host routed runtime`
6. `cross-host resume`
7. `two-host routed runtime`
8. `semantic head`
9. `ANE sidecar`
Each stage should only advance when its validation gate is written down and passed. If a stage fails, the loop is:
1. inspect metrics
2. identify whether the failure is:
- label problem
- model problem
- architecture problem
- systems problem
3. patch the smallest thing that explains the failure
4. rerun the same stage before advancing
Immediate Next Actions
- [x] optimize cross-host transport path latency
- [x] test direct-path / Thunderbolt-first addressing instead of the current ~1s route
- [x] add connection reuse on the cross-host replay lane
- [x] move from hidden-return remote resume into summary-only corrective continuation
- [x] promote a real margin-based stop/escalate policy for multi-token continuation
- [x] keep hidden-return as the fast path and reserve summary/deeper correction for escalation
- [ ] evaluate whether deeper remote continuation needs a new model/head rather than repeating the current latent-resume path
- [ ] integrate the promoted hybrid runtime into the main live prompt loop
Definition Of “Working As Intended”
The program is only working as intended if all of these become true:
- controller can distinguish accept, continue, revive, and escalate with useful confidence
- latent transfer reproduces downstream continuation closely enough to matter
- cross-host resume beats or justifies the transport cost
- semantic layer improves the controller rather than decorating it
- ANE sidecar takes real work off the GPU
Right now, the controller stage is working. The full AGP system is not finished yet.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/docs/research/agp-execution-roadmap-v1.md
Detected Structure
Method · Evaluation · References · Code Anchors · Architecture