Grand Diomande Research · Full HTML Reader

Whisper Encoder / Feature Path Audit

**Date:** 2026-06-02 **Scope:** Determine whether the current workspace already contains a reusable CoreML Whisper encoder / ANE feature extraction path for the clean anchor ASR serving stack.

Language as Infrastructure experiment experiment writeup candidate score 32 .md

Full Public Reader

Whisper Encoder / Feature Path Audit

Date: 2026-06-02
Scope: Determine whether the current workspace already contains a reusable
CoreML Whisper encoder / ANE feature extraction path for the clean anchor ASR
serving stack.

Finding

No reusable Whisper encoder CoreML artifact was found locally for the clean
anchor path, so a new exporter was added and validated.

Targeted searches under `[home-path]` and `/Volumes/HD1` found unrelated CoreML
models such as MotionMix `ConditioningEncoder.mlpackage`, but no
`Whisper.mlmodel`, `Whisper.mlmodelc`, `Whisper*.mlpackage`, or equivalent
ASR encoder package.

The closest historical path is:

text
/Volumes/HD1/Mac4-Offload/Desktop/ane-training/

That folder contains:

  • `ane_ctc_train.py`
  • `features/*.pt`
  • `pairs.jsonl`
  • MLX CTC-head checkpoints under `checkpoints/`

It does not contain `.mlmodel`, `.mlmodelc`, or `.mlpackage` files.

What the Historical ANE Folder Actually Is

Despite the script title ("ANE+MLX CTC Training — Frozen Whisper encoder on ANE,
CTC head on MLX GPU"), `ane_ctc_train.py` is a trainer over already-extracted
feature tensors. It does not export or run a Whisper encoder. The training loop
loads `.pt` tensors from `features/` and trains a small MLX CTC head.

Observed local feature shape:

pathshapedtype
`features/bam_train_000000.pt``(375, 1280)``torch.float16`
`features/bam_train_000001.pt``(375, 1280)``torch.float16`
`features/bam_train_000002.pt``(375, 1280)``torch.float16`

Count: `1,381` feature files.

Checkpoint metadata:

json
{"epoch": 17, "best_val_loss": 81.29580624898274, "global_step": 799}

Why This Does Not Close the Clean Anchor Path

The validated clean anchor CTC head consumes native HF Whisper-large-v3 encoder
features shaped:

text
[1, 1500, 1280] -> anchor temporal_ds stride-4 -> [1, 375, 66]

The historical ANE features are already downsampled to:

text
[375, 1280]

Feeding those into the clean anchor would double-downsample or otherwise shift
the train/serve feature contract. This is the same class of mismatch that
previously produced all-blank anchor output.

Export Result

`export_whisper_encoder_coreml.py` now exports the actual clean-anchor encoder
boundary from `openai/whisper-large-v3`.

Authoritative model config:

fieldvalue
`num_mel_bins``128`
`max_source_positions``1500`
`d_model``1280`

Correct feature contract:

text
mel [1, 128, 3000] -> Whisper-large-v3 encoder -> [1, 1500, 1280]

This corrects the older 80-mel assumption. Whisper-large-v3 uses 128 mel bins;
the first smoke attempt failed with `expected ... to have 128 channels, but got
80 channels`.

Validated artifacts:

artifactvalue
exporter`experiments/acoustic_gate/export_whisper_encoder_coreml.py`
report`experiments/acoustic_gate/whisper_encoder_coreml_export_report.json`
package`/Volumes/NKOCoreMLWork/whisper_large_v3_encoder_fp32.mlpackage`
compiled model`/Volumes/HD1/nko_coreml/device_harness_resources/whisper_large_v3_encoder_fp32.mlmodelc`
package size`2,548,206,683` bytes
compiled size`2,548,272,142` bytes
torch smokeOK, output shape `[1, 1500, 1280]`
traceOK
`coremlcompiler compile`OK
device harness report`experiments/acoustic_gate/whisper_encoder_device_harness_report.json`

Packaging caveat: direct CoreML conversion on `/Volumes/HD1` failed at final
package creation because the external volume produced AppleDouble sidecars such
as `._weight.bin`, which made CoreMLTools' copy step fail with `File exists`.
The successful path uses an APFS sparse image stored at:

text
/Volumes/HD1/nko_coreml/NKOCoreMLWork.sparseimage

mounted at:

text
/Volumes/NKOCoreMLWork

The exporter defaults `COPYFILE_DISABLE=1` and writes progress checkpoints after
smoke, trace, and conversion start so future crashes leave a useful report.

Device Harness Status

The external harness at:

text
/Volumes/HD1/nko_coreml/AnchorHeadDeviceHarness

has been staged with:

- the compiled encoder model under
`/Volumes/HD1/nko_coreml/device_harness_resources/`
- deterministic mel, expected encoder-output, and expected CTC-head fixtures
under the same stable resource directory
- `testWhisperEncoderCoreMLParityOnDevice`
- `testWhisperEncoderToHeadToRankerPipelineOnDevice`
- `testRealAudioWhisperEncoderToHeadToRankerPipelineOnDevice`

`xcodebuild build-for-testing` passed for both the encoder-only parity harness
and the chained pipeline harness:

text
mel -> CoreML Whisper encoder -> CoreML anchor CTC head -> greedy argmax
    -> Swift ranker fixture

The random seed-1234 mel fixture decodes to all blank frames, so that early
test was a runtime/parity/shape smoke, not a semantic ASR sample. The later
real-audio split harness supersedes that smoke: physical-iPhone runtime and
same-proof Instruments placement evidence are now captured in
`/Volumes/HD1/tmp/nko_real_audio_device_watch_active_20260603/20260603_075734_iPhone7_proof_attempt1`.
ANE acceleration is still not proven.

The Swift serving package has since been upgraded from ranker-only to a full
deterministic correction package. `export_ranker_for_serving.py` now emits
`NKOCorrectionEngineV1.swift`, which embeds the frozen confusion maps, generates
bounded COPY/SUB/DEL/INS candidates, scores them with a small CTC forward
algorithm, and selects with `NKOCandidateRankerV1`. `swift test` passes 5 tests,
and the external Xcode harness compiles with the correction engine included.

The chained harness has also been upgraded beyond random input. A real local
Djoko WAV fixture,
`/Volumes/HD1/nko-brain-scanner/djoko_audio/IAfxh3pI1R4.wav`, is decoded through
ffmpeg into deterministic 30s mono 16k audio, converted with the Whisper-large-v3
128-mel feature extractor, passed through PyTorch Whisper encoder + anchor CTC
head, and packaged as real-audio-derived expected fixtures:

text
audio [480000] -> mel [1,128,3000] -> encoder [1,1500,1280]
              -> logits [1,375,66] -> greedy text "ߌ         "

This is still not a product-quality N'Ko utterance sample: the source WAV has a
corrupt trailing packet and the greedy decode is sparse (`16` nonblank frames).
But it is materially stronger than the random-mel smoke because the entire model
chain now has a real audio-derived fixture. The real-audio test now calls the
Swift correction engine after greedy decode, so the build-ready chain is:

text
audio -> CoreML Whisper encoder -> CoreML anchor CTC head -> greedy decode
      -> deterministic candidates -> CTC candidate scoring -> Swift ranker
      -> corrected N'Ko

`xcodebuild build-for-testing` passes for this real-audio test when DerivedData
is placed on HD1:

text
/Volumes/HD1/tmp/NKORealAudioPipelineHarnessDD

The first root-backed build failed while copying the 2.55GB encoder package with
`No space left on device`; this was a storage-layout issue, not a Swift/CoreML
failure. After moving rebuildable Xcode simulator/device caches, root space was
restored and the final correction-engine build passed.

Follow-up device-prep hardening on 2026-06-02:

- Xcode's rebuildable `[home-path] DeviceSupport` cache was
moved behind an HD1-backed symlink to avoid the root-volume space failure that
previously interrupted device preparation.
- `xcrun devicectl manage ddis update` confirmed the host Developer Disk Image
registry is already current for the selected Xcode/CoreDevice install.
- A focused iPhone 16 Plus run briefly reached XCTest but executed `0` tests and
logged `Error locating DeviceSupport directory`; this is explicitly invalid
as runtime evidence.
- The trace runner now fails closed if Xcode cannot see the device, Instruments
cannot see the device online, XCTest executes zero tests, DeviceSupport lookup
fails, or the real-audio benchmark line is missing.
- `xcodebuild -prepareDeviceSupport -destination 'platform=iOS,id=00008140-001818491A88801C'`
completed for iPhone (7), producing finalized DeviceSupport for
`iPhone17,2 26.5 (23F77)` under the HD1-backed symlink.
- The focused test selector was corrected to the full XCTest path:
`AnchorHeadDeviceHarnessTests/AnchorHeadDeviceHarnessTests/testRealAudioWhisperEncoderToHeadToRankerPipelineOnDevice`.
- A clean rebuild after removing AppleDouble `._*` sidecars from the staged
CoreML resources passed.
- The real-audio XCTest now truly executes one test on iPhone (7), but fails
while CoreML builds the Whisper encoder execution plan on device: BNNS reports
`Storage Reader expects file format version 2`, Espresso reports
`MIL->EIR translation error ... model.mil:511:12: index out of bounds`, and
CoreML returns error code `-14`.
- Instruments/CoreML recording still is not usable: `xctrace` reports
`An unknown problem is preventing this device from recording.` The saved trace
bundle is not evidence of ANE/GPU/CPU placement.

Follow-up CPU-only probe:

- A reproducible MIL patch changed only the positional embedding constant shape
from `[1500,1280]` to `[1,1500,1280]`, leaving the weight bytes unchanged:
`patch_whisper_encoder_mlmodelc_explicitpos.py`.
- The staged source and signed test bundle were verified clean: no in-package
`model.mil.orig`, and both the positional constant and line-511 add now carry
`[1,1500,1280]`.
- `testWhisperEncoderCoreMLParityOnDevice` was rerun on iPhone (7) with
`MLComputeUnits.cpuOnly`. It executed one XCTest and failed with the same
device compiler error: BNNS `Storage Reader expects file format version 2`,
Espresso `model.mil:511:12` index-out-of-bounds, CoreML `-14`.

Current blocker: the harness and test selection are now real; the FP32
Whisper-large-v3 CoreML MLProgram is not device-executable on the physical
iPhone yet. The failure is not isolated to Neural Engine scheduling because
CPU-only fails before prediction. The next path is a real re-export or split
graph, starting with conv+explicit-position and then encoder-layer chunks, so we
can find the first device-compatible boundary.

Follow-up split-graph probes:

- Added `export_whisper_split_coreml.py`, which exports smaller Whisper
boundaries with fixture generation and `coremlcompiler` output:
`convpos` and single encoder `layer`.
- `convpos` exports:
`mel [1,128,3000] -> conv1/conv2 + explicit positions -> [1,1500,1280]`.
macOS CoreML parity passed (`mse=4.28e-13`, max abs `2.77e-05`), and the
iPhone (7) XCTest passed: CPU-only `42.454ms`, CPU+NE requested `31.174ms`,
`mse=3.39e-14`, max abs `4.77e-06`.
- `layer00` exports one Whisper encoder transformer layer over the convpos
fixture. macOS CoreML parity passed (`mse=1.23e-12`, max abs `7.51e-06`), and
the iPhone (7) XCTest passed: CPU-only `110.054ms`, CPU+NE requested
`98.289ms`, `mse=3.27e-13`, max abs `8.82e-06`.
- Multi-layer prefix chunks further narrow the device compiler behavior. The
two-layer prefix `layers00_01` passes on iPhone (7): CPU-only `168.673ms`,
CPU+NE requested `165.390ms`, `mse=3.71e-13`, max abs `8.94e-06`.
The three-layer prefix `layers00_02` also passes: CPU-only `247.909ms`,
CPU+NE requested `234.544ms`, `mse=4.06e-13`, max abs `8.94e-06`.
- The four-layer prefix `layers00_03` compiles and executes but fails physical
device parity despite tight macOS CoreML parity: CPU-only `331.875ms`,
CPU+NE requested `324.517ms`, `mse=0.03256`, max abs `1.466`.
- Standalone layer 3, when fed the passing `layers00_02` fixture, passes on
iPhone (7): CPU-only `85.231ms`, CPU+NE requested `82.393ms`,
`mse=5.49e-14`, max abs `5.41e-06`. That makes the bad artifact the
four-layer CoreML graph composition/device lowering, not layer 3 itself.
- Later three-layer chunks complete the staged encoder artifact path through
final layer norm. `layers04_06`, `layers07_09`, `layers10_12`,
`layers13_15`, `layers16_18`, `layers19_21`, `layers22_24`,
`layers25_27`, `layers28_30`, standalone `layer31`, and `finalnorm` all
export, run macOS CoreML prediction, compile with `coremlcompiler`, and bundle
into the external iOS XCTest harness. Later macOS CoreML parity remains tight:
`layers13_15` `mse=1.88e-12`, `layers16_18` `8.92e-13`,
`layers19_21` `1.97e-13`, `layers22_24` `1.01e-13`,
`layers25_27` `1.23e-13`, `layers28_30` `1.83e-13`, `layer31`
`7.51e-14`, and `finalnorm` `7.89e-15`.
- The external harness now has full-split seed and real-audio pipeline tests.
`testWhisperSplitFullEncoderCoreMLParityOnDevice` runs the 14-stage split
encoder through finalnorm. `testRealAudioWhisperSplitEncoderToHeadToRankerPipelineOnDevice`
runs real audio/Whisper mel -> full split encoder -> CoreML CTC head ->
greedy decode -> deterministic Swift correction/ranker, and emits
`WHISPER_SPLIT_REAL_AUDIO_PIPELINE_DEVICE_BENCH` when XCTest reaches the body.
Generic iOS `xcodebuild build-for-testing` passes with the 3.9GB host app and
2.7GB XCTest plug-in; the built test bundle contains the real-audio Djoko
fixtures plus `whisper_large_v3_layers28_30_fp32.mlmodelc`,
`whisper_large_v3_layer31_fp32.mlmodelc`, and
`whisper_large_v3_finalnorm_fp32.mlmodelc`.
- Historical preflight failures are preserved in
`whisper_split_device_probe_report.json`: early attempts stopped on locked
devices, install-delta state, or malformed all-process traces before the final
process-attached proof succeeded. Those failures shaped the current runner:
strict launch preflight, output-disk guard, runtime-marker analysis, traced
runtime-marker analysis, CoreML trace analysis, and proof-summary gating.
- The completed proof is
`/Volumes/HD1/tmp/nko_real_audio_device_watch_active_20260603/20260603_075734_iPhone7_proof_attempt1`.
`pipeline-runtime-analysis.json` and `pipeline-runtime-analysis-traced.json`
both report `runtime_marker_requirements_passed`. `coreml-trace-analysis.json`
reports `placement_markers_found`. `proof-summary.json` reports
`proof_complete`, and `audit_ondevice_asr_goal.py` reports
`completion_ready=true`.
- The trace contains paired CoreML CPU/GPU markers and no paired CoreML-ANE
marker. Requested `cpuAndNeuralEngine` timings are therefore not an ANE claim;
they are only requested compute-unit timings. The completed claim remains
on-device CoreML ASR with CPU/GPU placement/fallback evidence.
- Disk note: `/Volumes/HD1/tmp/root_cache_archive_20260602` was identified as a
2.3GB archived CoreSimulator cache and moved to HD1 Trash. Because Trash lives
on the same volume, `df` still reports roughly `3.1GiB` free until that Trash
item is emptied or moved to a writable external target. The active harness
DerivedData remains `/Volumes/HD1/tmp/NKORealAudioPipelineHarnessDD` at 4.4GB.
- Requested compute-unit timings alone are not ANE placement proof. The final
process-attached Instruments trace supplies the placement evidence we do have:
paired CoreML CPU/GPU markers, with no paired CoreML-ANE marker.

Updated deployment path: the front of Whisper, single layers, and at least a
three-layer prefix chunk are device-compatible. A four-layer prefix package is
not numerically faithful on physical iPhone, so the deployable path is the
sequential chunked CoreML encoder with small chunks. The full staged chain is
exported, bundled, and proven in the real-audio physical-iPhone pipeline:

text
convpos -> layers00_02 -> layer03 -> layers04_06 -> layers07_09
        -> layers10_12 -> layers13_15 -> layers16_18 -> layers19_21
        -> layers22_24 -> layers25_27 -> layers28_30 -> layer31
        -> finalnorm -> CTC head

The next probe is not another whole-encoder rerun. Follow-up work should be
optimization or iPad-specific replication; the active physical-iPhone runtime
and same-proof CoreML trace gate is complete.

Decision

Treat the old `ane-training` folder as evidence of an earlier feature extraction
experiment, not as a deployable clean-anchor encoder path.

Use the newly exported FP32 Whisper-large-v3 CoreML encoder as a reproducible
export artifact and negative device-compatibility baseline, not as a proven
runtime component. The target full on-device path remains:

text
audio -> CoreML Whisper encoder -> CoreML anchor CTC head -> greedy decode
      -> deterministic candidates -> Swift ranker -> corrected N'Ko

Status

- CoreML anchor CTC head: validated on physical iPad.
- Swift ranker: validated on physical iPad.
- Swift deterministic correction engine: exported and tested in the `NKORanker`
Swift package; included in the external real-audio iOS build.
- TurboQuant around feature tensors: CER-safe locally.
- CoreML Whisper encoder: exported and compiled as FP32 ML Program on APFS work
volume; whole-encoder device execution fails, including CPU-only, even after
the explicit-position shape patch.
- CoreML split encoder probes: conv+position, standalone layer 0, `layers00_01`,
`layers00_02`, and standalone layer 3 after `layers00_02` pass on physical
iPhone with tight parity. The four-layer prefix `layers00_03` executes but is
numerically wrong on-device. The sequential small-chunk path is exported,
compiled, bundled, build-ready through `finalnorm`, and runtime-proven in the
full real-audio pipeline.
- Real-audio split pipeline harness: builds for generic iOS and bundles audio,
mel, expected encoder, expected logits, expected argmax, and manifest fixtures
for `djoko_IAfxh3pI1R4`. The completed proof is
`/Volumes/HD1/tmp/nko_real_audio_device_watch_active_20260603/20260603_075734_iPhone7_proof_attempt1`.
It passed runtime and traced-runtime analysis with
`WHISPER_SPLIT_REAL_AUDIO_PIPELINE_DEVICE_BENCH`: `480000` audio samples,
14 split encoder stages in both compute modes, CoreML CTC head parity, nonblank
greedy decode, 120 bounded candidates, and Swift ranker acceptance.
- Goal-completion audit: `ondevice_asr_goal_audit.json` reports
`completion_ready=true`. The full on-device ASR serving objective is complete
for the physical-iPhone artifact above.
- Runtime / true ANE acceleration: runtime and same-proof CoreML trace evidence
are proven; true ANE acceleration is not proven. The process-attached
Instruments trace exported successfully and `proof-summary.json` is
`proof_complete`. The trace contains paired CoreML CPU/GPU markers
(`paired_coreml_compute_marker_totals={ane:0,cpu:33,gpu:3,coreml:652}`), but
no paired CoreML-ANE marker. The defensible claim is "full on-device CoreML
N'Ko ASR with CPU/GPU placement/fallback evidence," not "ANE accelerated."

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-brain-scanner/experiments/acoustic_gate/WHISPER-ENCODER-PATH-AUDIT.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture