Grand Diomande Research · Full HTML Reader

AGP Execution Roadmap V1

- `Gemma 4 E2B` Thunder stage-one LoRA backbone - full train and held-out route oracle artifacts - route/vitality head `v1` on original conservative labels - calibrated threshold sweep over saved oracle metrics - recalibrated route/vitality head `v2` - three-head controller with earliest-layer supervision - corrected `transfer_v2` same-host adapter run

Research Practice proposal experiment writeup candidate score 40 .md

Full Public Reader

AGP Execution Roadmap V1

Date: `2026-04-17`

Current Status

The AGP program is through the first control-stage milestone.

Completed:

`Gemma 4 E2B` Thunder stage-one LoRA backbone
full train and held-out route oracle artifacts
route/vitality head `v1` on original conservative labels
calibrated threshold sweep over saved oracle metrics
recalibrated route/vitality head `v2`
three-head controller with earliest-layer supervision
corrected `transfer_v2` same-host adapter run

Current best control baseline:

calibrated label regime: `kl=4.0`, `margin_delta=0.15`
held-out route accuracy: `0.8444`
held-out vitality accuracy: `1.0`
exact held-out separation for:
`accept_local`
`revive_local`
`escalate`
remaining route confusion is mainly:
`continue_local -> accept_local`

This means the backbone hidden states are already useful enough to support a learned controller. The remaining work is to convert that controller into a real distributed latent-transfer system.

Current transfer baseline:

corrected dataset: `transfer_v2_k4_m015`
best run: `transfer_adapter_v2_20260418_000114`
best checkpoint: step `1400`
held-out overall:
cosine `0.9226`
MSE `0.1097`
held-out by route:
`escalate`: cosine `0.9310`, MSE `0.0792`
`continue_local`: cosine `0.9432`, MSE `0.0979`
`accept_local`: cosine `0.6189`, MSE `0.6541`
held-out by layer:
layer `30`: cosine `0.9360`, MSE `0.0868`
layer `26`: cosine `0.6189`, MSE `0.6541`

Interpretation:

late-boundary transfer is already strong enough to justify the next routed-resume stage
true early-layer transfer is still weak
the next phase should not pretend these are the same problem

Promoted same-host transfer baseline:

canonical transfer adapter:
`transfer_adapter_logit_v2_20260418_011824`
why it wins:
hidden-only transfer looked acceptable in latent space but failed badly in live next-token behavior
logit-aware transfer fixed the real runtime bottleneck
live prompt-loop comparison on the same `8` held-out prompts:
route mix unchanged:
`7 continue_local`
`1 escalate`
hidden-only transfer:
mean live KL `5.5070`
top-1 teacher match `0.0`
logit-aware transfer:
mean live KL `0.6490`
top-1 teacher match `0.8571`
interpretation:
same-host `continue_local` is now operational not only in latent similarity, but in next-token behavior
local transfer is no longer the blocker

Packetized same-host baseline:

tools:
`runtime/agp_packet.py`
`runtime/run_mock_packet_replay_v1.py`
same `8` held-out prompts / `7` continue-local cases:
`fp16` request packets:
mean bytes `3832`
mean KL `0.6497`
top-1 match `0.8571`
`q8_0` request packets:
mean bytes `1870.3`
mean KL `0.7365`
top-1 match `0.8571`
interpretation:
request-side packet compression is already viable
`q8_0` roughly halves request size without materially harming the promoted local continuation path

First cross-host replay baseline:

tools:
`runtime/agp_packet_resume_server_v1.py`
`runtime/run_cross_host_packet_replay_v1.py`
remote host:
`mac5`
request codec:
`q8_0`
same `8` held-out prompts / `7` continue-local cases:
mean request bytes `1820`
mean response bytes `3608.9`
mean KL `0.7362`
top-1 match `0.8571`
mean margin delta `0.0963`
median server timings from row report:
decode `~1.14ms`
infer `~6.42ms`
encode `~0.57ms`
median network roundtrip `~1207ms`
interpretation:
cross-host latent transport is now experimentally real
remote resume quality survives the network hop
the research bottleneck has moved from resume quality to transport path latency

First artifact-driven routed-resume result:

report:
`Desktop/Comp-Core/experiments/agp_mlx/runtime/reports/routed_resume_artifact_v1/routed_resume_artifact_report.json`
on the `23` transfer-eligible held-out records:
controller predicted:
`12 escalate`
`6 continue_local`
`5 accept_local`
route accuracy on this subset: `0.8261`
boundary accuracy on this subset: `0.3043`
vitality accuracy on this subset: `0.9565`
semantically separated result:
`continue_local` resume path:
`9` oracle continue cases
`6` controller-compatible continue cases
resumed hidden-state quality on those compatible cases:
cosine `0.9552`
MSE `0.0733`
interpretation:
the continue-local path is strong enough to operationalize
the next failure to attack is boundary drift (`30 -> 26`) and route spill from `continue_local -> accept_local`

First true same-host runtime policy:

runtime:
`Desktop/Comp-Core/experiments/agp_mlx/runtime/run_same_host_routed_runtime_v1.py`
confidence sweep:
`Desktop/Comp-Core/experiments/agp_mlx/runtime/sweep_route_confidence_policy_v1.py`
recommended local policy for this wave:
boundary = route-derived
`accept_local -> 26`
`continue_local -> 30`
otherwise `none`
local-action confidence gate = `0.55`
held-out result under that policy:
action counts:
`13 escalate`
`7 continue_local`
`3 accept_local`
route accuracy: `0.9130`
continue-local matched quality:
cosine `0.9519`
MSE `0.0747`
interpretation:
the local runtime can already support a trustworthy continue-local path
late-boundary resume is no longer the research bottleneck
next failure to attack is false local accept spill

Remaining Program

Stage 2.1 — Control Consolidation

Objective:

Turn the current calibrated route/vitality head into a trustworthy baseline controller.

Tasks:

[ ] freeze the current calibrated checkpoint as the control baseline
[ ] export per-class confusion, precision, recall, and macro-F1 for the held-out split
[ ] add a direct `earliest acceptable layer` prediction head
[ ] compare:
route-only prediction
route + vitality multitask
route + vitality + earliest-layer multitask
[ ] decide whether `accept_local` and `continue_local` should remain separate actions or become a shared accept state with boundary refinement downstream

Validation:

[ ] hold route macro-F1 above the majority baseline by a wide margin
[ ] preserve exact or near-exact `escalate` detection
[ ] preserve exact or near-exact `revive_local` detection
[ ] reduce `continue_local -> accept_local` error if possible

Exit criteria:

- [ ] controller is stable enough to supervise transfer experiments

Stage 2.2 — Confidence Calibration

Objective:

Replace raw logits with usable operating confidence.

Tasks:

[ ] add confidence extraction for route and vitality heads
[ ] measure expected calibration error on held-out data
[ ] tune thresholds for:
accept
continue
revive
escalate
[ ] build an abstain policy for low-confidence cases

Validation:

[ ] confidence correlates with correctness
[ ] abstain region improves safety without collapsing acceptance rate

Exit criteria:

- [ ] controller can expose a reliable decision + confidence packet to downstream transfer logic

Stage 3.0 — Transfer Dataset Construction

Objective:

Create the supervision substrate for hidden-state transfer.

Tasks:

[ ] derive transfer supervision splits from calibrated oracle records
[ ] group records by:
`accept_local`
`continue_local`
`escalate`
[ ] materialize source latent tensors for candidate split layers
[ ] materialize target continuation references for resumed decoding
[ ] define the first compact transfer target:
hidden-state reconstruction
resumed-logit fidelity
continuation agreement

Validation:

[ ] dataset covers both layer `26` and layer `30` dominated paths
[ ] train/valid split preserves route-class proportions

Exit criteria:

- [ ] `transfer_v1` dataset exists and is reproducible

Stage 3.1 — Same-Host Transfer Adapter

Objective:

Teach a compact latent-transfer module before involving network transport.

Tasks:

[ ] build a same-host transfer encoder/decoder in MLX
[ ] train on calibrated transfer supervision
[ ] evaluate:
hidden-state cosine
hidden-state MSE
resumed logit KL
continuation agreement
[ ] sweep packet bottleneck widths

Validation:

[x] same-host hidden-state reconstruction is strong on the dominant late-boundary path
[x] same-host resumed continuation tracks no-transfer baseline closely enough to be useful
[x] packet bottleneck is materially smaller than raw hidden-state transfer
[ ] early-layer transfer improves beyond the current layer `26` weakness

Exit criteria:

- [x] one transfer configuration is good enough to test in the local routed-resume loop

Stage 3.2 — Same-Host Routed Inference

Objective:

Close the loop locally before crossing machines.

Tasks:

[ ] build a local inference harness that:
runs the controller
chooses route action
if accepted, exits locally
if continue, resumes from calibrated late layer
if escalate, runs the deeper correction path
[ ] compare routed local inference versus full-depth baseline

Validation:

[ ] routed local inference reduces average active depth
[ ] quality stays within acceptable bounds
[ ] failure cases are clearly attributable

Exit criteria:

- [x] local AGP loop works on one machine

Immediate subplan for Stage 3.2:

[ ] build a routed-resume harness that loads:
the Thunder stage-one Gemma adapter
the best three-head controller
the best corrected transfer adapter
[ ] run held-out prompts through:
full baseline
continue-local resume path
early-accept resume path
forced escalate path
[ ] report:
resumed logit KL
continuation agreement
output-level acceptance / mismatch counts
per-layer failure breakdown
[x] establish the first artifact-backed local runtime policy
[x] turn the runtime from artifact replay into a live same-host prompt loop
[x] reduce false `accept_local` spill under the live local policy
[ ] if layer `26` remains weak, spin a focused transfer-improvement pass before any cross-host runtime work

Stage 4.0 — Cross-Host Resume Over Thunderbolt

Objective:

Turn same-host transfer into real `Mac4 -> Mac5` continuation.

Tasks:

[x] define the first production-shaped `AGP-PTP` packet
[x] send compressed latent packets to a remote host
[x] reconstruct on the receiving host
[x] resume the promoted transfer path on the remote host
[x] compare cross-host continuation to same-host packetized continuation
[ ] optimize the transport path itself
[ ] move from hidden-return resume to deeper corrective continuation

Validation:

[ ] packet transport overhead is lower than rerunning the entire path
[x] resumed continuation fidelity stays close to same-host transfer

Exit criteria:

- [x] first real cross-host AGP continuation demo exists

Updated immediate subplan for Stage 4.0:

[x] define the first latent packet around the promoted logit-aware transfer adapter
[x] instrument bytes per packet, encode/decode latency, and boundary metadata
[x] replay the same held-out local runtime cases across a mock packet boundary first
[x] move the packet across the real `mac1 -> mac5` path
[ ] rerun over the intended Thunderbolt-first/direct path with lower network RTT
[ ] add connection reuse, batching, or response compression if the path remains the limiter

Stage 4.1 — Two-Host Routed Runtime

Objective:

Make `Mac4` the reflex path and `Mac5` the corrective path.

Tasks:

[ ] integrate controller + transfer adapter + cross-host resume into one runtime loop
[ ] add telemetry for:
route counts
bytes transferred
resumed latency
accepted vs escalated outputs
[ ] run the held-out prompt pack end to end

Validation:

[ ] median latency improves on easy cases or compute drops materially
[ ] hard-case quality remains competitive with full-depth baseline

Exit criteria:

- [ ] two-host routed runtime is operational

Stage 5.0 — Semantic Projection Head

Objective:

Attach the typed semantic layer only after the control path is operational.

Tasks:

[ ] build primitive/invariant supervision from the semantic kernel
[ ] train the first semantic projection head
[ ] test whether semantic confidence improves:
dead-state detection
route confidence
transfer acceptance

Validation:

- [ ] semantic layer is operationally useful, not just interpretable

Exit criteria:

- [ ] semantic packet becomes part of the AGP control plane

Stage 6.0 — ANE Sidecar

Objective:

Move the cheap, frequent, shallow modules onto the Apple engine hierarchy.

Tasks:

[ ] identify exportable shallow modules:
route head
vitality head
semantic head
compact transfer encoder
[ ] benchmark MLX/GPU versus Core ML/ANE paths
[ ] test whether ANE offload reduces GPU contention or energy per token

Validation:

[ ] ANE sidecar improves energy or frees useful GPU time
[ ] no major quality or latency regression from export path

Exit criteria:

- [ ] Apple-engine partition is proven with a real module, not just theory

Long-Running Loop

The remaining work should run in this order:

1. `control consolidation`
2. `confidence calibration`
3. `transfer dataset construction`
4. `same-host transfer adapter`
5. `same-host routed runtime`
6. `cross-host resume`
7. `two-host routed runtime`
8. `semantic head`
9. `ANE sidecar`

Each stage should only advance when its validation gate is written down and passed. If a stage fails, the loop is:

1. inspect metrics
2. identify whether the failure is:
- label problem
- model problem
- architecture problem
- systems problem
3. patch the smallest thing that explains the failure
4. rerun the same stage before advancing

Immediate Next Actions

[x] optimize cross-host transport path latency
[x] test direct-path / Thunderbolt-first addressing instead of the current ~1s route
[x] add connection reuse on the cross-host replay lane
[x] move from hidden-return remote resume into summary-only corrective continuation
[x] promote a real margin-based stop/escalate policy for multi-token continuation
[x] keep hidden-return as the fast path and reserve summary/deeper correction for escalation
[ ] evaluate whether deeper remote continuation needs a new model/head rather than repeating the current latent-resume path
[ ] integrate the promoted hybrid runtime into the main live prompt loop

Definition Of “Working As Intended”

The program is only working as intended if all of these become true:

controller can distinguish accept, continue, revive, and escalate with useful confidence
latent transfer reproduces downstream continuation closely enough to matter
cross-host resume beats or justifies the transport cost
semantic layer improves the controller rather than decorating it
ANE sidecar takes real work off the GPU

Right now, the controller stage is working. The full AGP system is not finished yet.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/docs/research/agp-execution-roadmap-v1.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture