Mohamed Diomande

Full HTML reader

Read the full artifact

Extracted abstract or opening context

WP0 exists to produce the first trustworthy baseline for `Gemma 4 E2B` on a single Apple host. The host is `Apple M4` with `16 GB` memory. The installed `mlx` version is `0.31.1`. The Hugging Face model id `google/gemma-4-E2B` is visible and not gated. The immediate missing piece is `mlx_lm` in the default Python path. First, validate the preferred `mlx_lm` environment. If there is already a project-local or tool-local environment with `mlx_lm`, use that instead of modifying the global Python path. If no valid environment exists, create the smallest viable isolated environment for WP0 and install the exact packages needed for local Gemma 4 inference and hidden-state instrumentation. Second, run a smoke inference on `Gemma 4 E2B` with the simplest possible prompt and capture: the successful model load path the effective quantization startup latency first-token latency steady-state tokens per second peak memory estimate

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.