Grand Diomande Research · Full HTML Reader

N'Ko Speech Stack — System Architecture Ledger

This file prevents the program from looking like a sequence of overwritten ideas. The work has not been reset; it has compressed into layers. Each experiment either became a system component, a constraint, or a publishable negative result.

Language as Infrastructure experiment experiment writeup candidate score 32 .md

Full Public Reader

N'Ko Speech Stack — System Architecture Ledger

Status: living ledger, created 2026-06-02.

This file prevents the program from looking like a sequence of overwritten ideas.
The work has not been reset; it has compressed into layers. Each experiment either
became a system component, a constraint, or a publishable negative result.

One-Sentence Architecture

N'Ko speech becomes infrastructure when a clean script-native ASR anchor emits N'Ko,
a deterministic edit/ranker layer performs bounded harm-free correction, and the
whole path is served offline on iOS through CoreML, with ANE/TurboQuant reserved for
the heavy encoder/compression layer.

Anti-Overwrite Contract

Do not treat the newest benchmark as replacing the older work. Treat it as assigning
older work to a clearer layer. A result can end in four valid states:

1. Component: it remains in the serving/research path.
2. Constraint: it becomes a rule that prevents a repeat failure.
3. Negative result: it becomes publishable evidence about what not to build.
4. Parked branch: it remains outside the first phone proof, but feeds the broader
speech system later.

The current phone stack is only the narrow serving spine. The larger architecture also
keeps the representation papers, metric argument, uncertainty packet, provenance
corpus, speaker atlas, search/TTS branch, tone/phonemic-substrate work, and meaning
flywheel.

Research Lineage Map

Workstream / Paper	Original question	What it became architecturally	Current handling
Paper 1, Dead Circuits	Do general LLMs internally support N'Ko?	Motivation for building script-native infrastructure instead of relying on generic models.	Flagship representation paper; independent of ASR runtime proof.
Paper 3, cross-model invisibility	Is N'Ko invisibility model-specific?	Generality check for the infrastructure thesis.	Likely merge/extension of Dead Circuits.
Phonemic substrate / script advantage / Against WER	Is N'Ko only cultural output, or a better measurement target?	Foundation layer: N'Ko is the label space; CER becomes closer to phonemic accuracy than Latin WER/CER.	Preserve as the metric and representation argument.
Paper 2, Living Speech	Can direct audio-to-N'Ko ASR be built at all?	Historical build path: Whisper features, CTC heads, bridge, FSM, practical data constraints.	Technical report lineage, not final production claim.
Paper 4 / canonical 20 CER anchor	Which acoustic model is the retained ASR base?	Clean script-native anchor and cautionary evidence about reproduction/provenance.	Archived 20.57
Trajectory, TAR, TTT	Does dynamic acoustic state help recognition/adaptation?	Trajectory survives in the anchor and AGP row signals; TTT stays a speaker-adaptation branch.	Do not imply AGP or TTT caused the 20.57
ADR-001 feature provenance	Why did a cleanly loaded model decode blank?	Law: feature extractor, frame layout, and head are inseparable.	Hard constraint for every CoreML/ANE deployment.
AGP / acoustic verifier	Can an LLM correction loop safely improve ASR?	Governance layer and preservation/data-selection signal.	Free-form correction is negative; bounded correction survives.
Edit-op SFT experiments	Can a model emit constrained repairs?	Proved schema/latency pressure and COPY collapse; motivated deterministic candidates.	Negative result that shaped the ranker path.
Deterministic candidate/ranker	Can bounded local repairs reduce CER without harm?	Live correction component: candidate generator + CTC features + tiny ranker.	First positive correction layer; exported to Swift.
Featural edit objective	How should local N'Ko mistakes be priced?	Scoring geometry for local phonetic repairs rather than whole-string rewrites.	Keep as ranker feature family and paper argument.
CoreML / iOS harness	Can the complete acoustic path run on a phone?	Deployment spine: mel -> encoder -> head -> greedy -> correction.	Full physical-iPhone proof complete.
ANE	Can the heavy encoder run efficiently offline?	Hardware acceleration target, not a correctness source.	Not proven in final trace; claim CPU/GPU CoreML placement/fallback evidence only.
TurboQuant	Can serving tensors/weights be compressed without CER drift?	Compression/bit-budget layer around encoder/features/head.	Serving component; not a corrector.
Uncertainty packet / row contract	How do ASR, correction, search, and TTS share evidence?	Canonical interface: raw text, uncertainty, partition, provenance, correction, eligibility.	Must remain the interface for larger corpus/search work.
Provenance search	How does the system answer questions over real audio?	Retrieval layer over raw/corrected transcripts, timestamps, speakers, papers, and decisions.	Parked branch after phone ASR proof, not discarded.
Speaker atlas / diarization / TTS	How does this become a speech system, not only ASR?	Persistent speaker graph, adaptation lane, and high-confidence TTS subset.	Parked deployment branch with its own evidence gates.
Coinage / meaning seed	Can the system grow language resources from ASR output?	Corpus-growth and semantic-resource flywheel.	Research branch, not a dependency for first phone serving.
Tone / FAC / phonemic extension	How does the stack recover the script's unresolved axes?	Future reconstruction layer over text priors plus acoustic evidence.	Preserve as a parallel research pillar.

System Invariants

- N'Ko is the substrate, not decoration. The target script carries the metric,
edit geometry, and cultural/linguistic fidelity.
- Audio evidence stays upstream. LLMs, rankers, and gates may propose or select;
they do not erase acoustic provenance.
- Correction is bounded and auditable. The live path uses local candidates,
CTC support, featural cost, and a tiny gate, not unconstrained rewriting.
- Feature provenance is non-negotiable. A CTC head belongs to the extractor and
frame layout it was trained on.
- Deployment claims are separate from accuracy claims. GPU/PyTorch establishes
clean accuracy; CoreML/ANE/TurboQuant establish offline serving and efficiency.
- Partitions are policy. Stable, boundary, uncertain, and novelty rows should
route differently for correction, search, TTS, and corpus growth.
- Negative results are assets. Gemma full-string correction, schema collapse,
all-blank feature mismatch, and failed reproduction gates explain the final shape.

Layer Map

Layer	Role	Current Evidence	Status
Script substrate	N'Ko is not cosmetic transliteration; it is the phonemic target space where CER is more meaningful than Latin for Manding speech.	Archived paper/handoff program; anchor vocab = N'Ko block + space.	Survives as the foundation.
Clean anchor ASR	Canonical acoustic base. The contaminated 297k/ANE pilot is no longer the accuracy substrate.	20.57
Feature provenance rule	A head must be served with the same extractor layout it was trained on.	GPU anchor expects 1500-frame Whisper features; ANE pilot features were pre-downsampled 375-frame tensors and caused all-blank output when mismatched.	Architectural law.
Preservation / data selection	Reference-free acoustic self-score is real, but not as large as first measured.	Clean revalidation lowered preservation AUC from 0.923 to about 0.739; harvest remains low-yield but usable.	Deployable guardrail, not headline correction.
Full-string LLM correction	Free-form Gemma correction is the wrong live interface.	Clean LoRA trained, but full generation failed latency gate; clean eval did not reduce CER.	Publishable negative result.
Bounded edit candidates	Candidate generator emits COPY/SUB/DEL/INS local repairs, not whole strings.	500-row serving slice: baseline CER 0.435201 -> corrected 0.402613, -3.26pp, 381 better / 15 same / 0 worse.	First real correction win.
Featural objective	Edits should be scored by phonetic/feature distance, so one-feature slips are cheap and local.	`featural_edit.py`; edit-op convergence; voicing SUB costs about 0.33 instead of whole-line rewrite.	Scoring principle for edit layer.
Swift correction engine	Tiny deterministic logistic gate plus bounded candidate/CTC scorer selects safe candidates on-device.	`NKOCandidateRankerV1.swift` + `NKOCorrectionEngineV1.swift`; Swift package tests pass; real-audio iOS harness builds with correction engine.	Serving component.
CoreML acoustic path	Move acoustic inference onto iOS: Whisper encoder -> anchor CTC head -> greedy decode.	Whole Whisper encoder fails in iPhone execution-plan construction, including CPU-only after explicit-position MIL patch. Split probes pass: conv+position, standalone layer 0, prefix chunks `layers00_01` and `layers00_02`, and standalone layer 3 after `layers00_02`. The four-layer prefix `layers00_03` executes but is numerically wrong on iPhone. The full sequential split chain through `layers28_30`, `layer31`, and `finalnorm` is exported, compiled, bundled, and proven in the real-audio split XCTest: audio/Whisper mel -> full split encoder -> CoreML head -> greedy -> Swift correction/ranker. The final artifact has passing runtime, passing traced runtime, and CoreML trace analysis with paired CPU/GPU markers. The external harness app is now a visible SwiftUI diagnostic console, not just an XCTest host.	Full on-device serving proof complete.
ANE	Hardware acceleration target for the heavy frozen encoder, not the source of correctness.	Final trace has unpaired ANE hardware markers but no paired CoreML-ANE marker; paired CoreML CPU/GPU markers are present.	ANE acceleration not proven.
TurboQuant	Compression/bit-budget layer around features/weights/tensors, not a text corrector.	500-row serving stack: affine8 preserves corrected CER at 1.78x compression; affine4 essentially preserves corrected CER at 3.20x.	Serving/compression component.
Meaning / coinage loop	Self-extension layer: harvested coinages and meaning pairs can grow language resources.	Stage 1 harvest: 305 plausible coinages from 1,381 hyps; filtered to 44 defensible additions for meaning SFT prep.	Research branch, not live ASR dependency.

Visible Device Harness

The iOS harness has two jobs, and they should not be confused:

XCTest harness: produces auditable device evidence and result bundles.
Installed app: lets a human open the app and run diagnostic slices directly.

The first installed app was only a placeholder screen because all useful work lived
in `AnchorHeadDeviceHarnessTests`. That was a harness packaging gap, not evidence
that the ASR stack had disappeared.

As of 2026-06-02, the external harness at
`/Volumes/HD1/nko_coreml/AnchorHeadDeviceHarness` has a visible SwiftUI dashboard:

- resource checks for the bundled CoreML models and fixtures;
- one-tap diagnostics for the CoreML anchor CTC head, Whisper conv+position,
sequential Whisper `layers00_02 -> layer03`, Whisper `layers04_06`,
`layers07_09`, `layers10_12`, and the frozen Swift ranker;
- greedy CTC decode and deterministic correction-engine output from the app
target itself;
- a `Run All` path for a quick device smoke.

The app target now links the local `NKORanker` Swift package and bundles a trimmed
resource set. It intentionally excludes the known-bad 2.4GB full Whisper encoder
from the visible app; the full encoder remains XCTest evidence/negative-result
material until the split graph is complete. Verified build/install:

text

xcodebuild build -scheme AnchorHeadDeviceHarness
  -destination 'platform=iOS,id=00008140-001818491A88801C'
  -derivedDataPath /Volumes/HD1/tmp/NKOVisibleHarnessDD

App bundle: /Volumes/HD1/tmp/NKOVisibleHarnessDD/Build/Products/Debug-iphoneos/AnchorHeadDeviceHarness.app
Bundle size: 779MB
Installed and launched bundle id: com.mohameddiomande.nko.AnchorHeadDeviceHarness

Follow-up on 2026-06-03: the XCTest-side split encoder is now staged through the
full Whisper-large-v3 encoder, ending at
`/Volumes/HD1/nko_coreml/device_harness_resources/whisper_large_v3_finalnorm_fp32.mlmodelc`.
The later chunks pass macOS CoreML parity through finalnorm, and the built
XCTest plug-in contains the real-audio Djoko fixtures plus the final split
chunks. Generic iOS `build-for-testing` passed with these full-split resources,
including the active
`testRealAudioWhisperSplitEncoderToHeadToRankerPipelineOnDevice` benchmark
marker `WHISPER_SPLIT_REAL_AUDIO_PIPELINE_DEVICE_BENCH`.

The final physical proof completed on iPhone (7) at
`/Volumes/HD1/tmp/nko_real_audio_device_watch_active_20260603/20260603_075734_iPhone7_proof_attempt1`.
The proof ran real audio through:

text

audio -> Whisper mel -> 14-stage split Whisper/CoreML encoder
      -> CoreML anchor CTC head -> greedy decode
      -> deterministic candidates -> Swift ranker correction

Runtime-marker analysis is automated by
`experiments/acoustic_gate/analyze_pipeline_runtime_log.py`. In the completed
proof, both `pipeline-runtime-analysis.json` and
`pipeline-runtime-analysis-traced.json` report
`runtime_marker_requirements_passed`: 480000 audio samples, 14 split encoder
stages in both compute modes, encoder/head parity, nonblank greedy decode, 120
bounded candidates, and Swift ranker acceptance.

Trace analysis is automated by `experiments/acoustic_gate/analyze_coreml_trace.py`.
The completed process-attached CoreML trace exports successfully and
`coreml-trace-analysis.json` reports `placement_markers_found`. The trace has
paired CoreML CPU/GPU markers
(`paired_coreml_compute_marker_totals={ane:0,cpu:33,gpu:3,coreml:652}`), plus
unpaired ANE hardware markers. That is enough for CoreML CPU/GPU
placement/fallback evidence, but not enough for an ANE acceleration claim.

Completion gating is automated in
`experiments/acoustic_gate/audit_ondevice_asr_goal.py`, which writes
`experiments/acoustic_gate/ondevice_asr_goal_audit.json`. Current status is
`complete`/`completion_ready=true`. The runner also writes `proof-summary.json`
via `summarize_ondevice_asr_proof.py`; the completed summary is
`proof_complete`.

What Did Not Get Overwritten

The earlier papers are now different layers of one stack:

- Script advantage / phonemic substrate: why N'Ko itself matters as the output
space and why Latinized targets are scientifically weaker.
- Trajectory / clean anchor: the canonical ASR model and feature-provenance
discipline.
- AGP / acoustic gate: the safety and selection layer. It prevents damage and
helps harvest data, even when free-form correction fails.
- Featural edit work: the reason edit operations are scored as local acoustic
repairs instead of whole-glyph rewrites.
- TurboQuant / ANE: deployment infrastructure, not a replacement for the ASR
or correction science.
- Coinage / meaning seed: the self-extending language layer, useful for corpus
growth and future semantic models, but not required for the first phone ASR
proof.

Publishable Shape

The strongest paper is no longer "LLM corrects N'Ko ASR." It is:

> Full-string LLM correction is too slow and too unsafe for live low-resource ASR,
> but bounded script-native edit candidates, scored by acoustic CTC evidence and
> gated by a tiny ranker/featural objective, can reduce N'Ko CER without observed
> harm and are small enough for on-device iOS serving.

That paper has three clean claims:

1. Negative result: free-form transcript correction trains but fails latency
and does not improve clean-anchor CER.
2. Positive correction result: deterministic edit candidates + ranker produce
a measured CER gain on a held-out serving slice.
3. Systems result: the correction layer is Swift-sized; the acoustic path is
CoreML; TurboQuant/ANE define the on-device acceleration path.

Final Integration Target

The final phone stack is:

text

audio
  -> log-mel [1,128,3000]
  -> CoreML Whisper-large-v3 encoder [1,1500,1280]
  -> CoreML clean-anchor CTC head [1,375,66]
  -> greedy N'Ko decode
  -> deterministic COPY/SUB/DEL/INS candidate generator
  -> CTC candidate scoring
  -> featural/ranker gate
  -> corrected N'Ko

Remaining Work After This Proof

- Preserve the whole-encoder negative result while using the split encoder as the
active path. CPU-only already fails for the whole encoder, even after the
explicit-position shape patch, so this is not only a Neural Engine scheduling
failure.
- Treat iPad replication as a follow-up, not as the current completion blocker.
The completed full audio-to-correction proof is on physical iPhone; earlier
iPad evidence covers the CoreML CTC head and Swift ranker.
- Do not claim ANE acceleration from the completed proof. A future optimization
pass may pursue paired CoreML-ANE markers, CoreML-safe quantization, or model
restructuring, but this proof supports only on-device CoreML with CPU/GPU
placement/fallback evidence.
- Write the final report around the stack above, not around any single experiment.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-brain-scanner/experiments/acoustic_gate/SYSTEM-ARCHITECTURE-LEDGER.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture