WP0 Execution Checklist
The host is `Apple M4` with `16 GB` memory. The installed `mlx` version is `0.31.1`. The Hugging Face model id `google/gemma-4-E2B` is visible and not gated. The immediate missing piece is `mlx_lm` in the default Python path.
Full Public Reader
WP0 Execution Checklist
WP0 exists to produce the first trustworthy baseline for `Gemma 4 E2B` on a single Apple host.
The current machine has already been partially validated:
The host is `Apple M4` with `16 GB` memory. The installed `mlx` version is `0.31.1`. The Hugging Face model id `google/gemma-4-E2B` is visible and not gated. The immediate missing piece is `mlx_lm` in the default Python path.
The next execution sequence should be:
First, validate the preferred `mlx_lm` environment. If there is already a project-local or tool-local environment with `mlx_lm`, use that instead of modifying the global Python path. If no valid environment exists, create the smallest viable isolated environment for WP0 and install the exact packages needed for local Gemma 4 inference and hidden-state instrumentation.
Second, run a smoke inference on `Gemma 4 E2B` with the simplest possible prompt and capture:
the successful model load path
the effective quantization
startup latency
first-token latency
steady-state tokens per second
peak memory estimate
Third, inspect the model wrapper and identify the smallest safe set of candidate layers for instrumentation. WP0 does not need exhaustive layer capture. It needs enough coverage to map early, middle, and late representation quality.
Fourth, assemble the first prompt pack from real local distributions. The initial categories remain conversational continuity, coding, memory-grounded prompts, and semantic-layer prompts. The point is not benchmark vanity. The point is to measure the architecture on the workload it is supposed to help.
Fifth, run the baseline harness and save the outputs into the `results/` directory in the canonical schema.
WP0 should stop after the baseline report and split-layer map are written. It should not silently slide into ANE experiments, cross-host transport, or routing implementation.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/benchmarks/agp-mlx-ane-wp0/wp0-execution-checklist.md
Detected Structure
Method · Evaluation · Architecture