Grand Diomande Research · Full HTML Reader

N'Ko Acoustic Coding — Featural Acoustic Coding (FAC)

Research code and paper for the N'Ko tone-resolution seam: using acoustic evidence, especially F0, to restore tone marks that a toneless N'Ko ASR pipeline cannot recover from text alone.

Language as Infrastructure technical note experiment writeup candidate score 36 .md

Full Public Reader

N'Ko Acoustic Coding — Featural Acoustic Coding (FAC)

Research code and paper for the N'Ko tone-resolution seam: using acoustic
evidence, especially F0, to restore tone marks that a toneless N'Ko ASR pipeline
cannot recover from text alone.

The one-line thesis

The recognizer emits toneless N'Ko, but N'Ko tone is written in the signal:
speaker-relative F0 supplies evidence that a text-only prior cannot see. FAC is
the acoustic likelihood side of that estimator; AGP is the governance gate that
decides where a correction is safe.

Why N'Ko actually wins (the grounded version, not hype)

N'Ko slot	Acoustic axis it encodes	Native?
Tone marks (7 marks, `U+07EB`–`U+07F1`)	pitch register + contour, with short/long variants	native
Nucleus (7 vowels)	spectral centroid / formant color	native
Onset (consonant manner)	attack transient type	native
Nasal coda	resonant sustain	native
harmonicity / spread / roughness / dynamics	timbre	designed extension

The correction that matters: falling tone is native. Unicode names U+07EE as
`NKO COMBINING LONG DESCENDING TONE`; no designed pitch mark is needed for
falling. Designed FAC extensions are reserved for higher timbral descriptors,
not pitch.

The measured corpus prior is now generated by code, not copied by hand:
4,139 parsed syllables across 105 entries show roughly 65.8
33.3
problem register-first and confidence-gated, not a broad claim that contour
dominates all tone.

What makes this real, not speculative

It's built on assets that already exist in your stack:

- `Desktop/NKo/nko/syllable_codebook.py` — already enumerates a few thousand tonal
N'Ko syllables as a discrete featural codebook (onset/nucleus/coda/tone),
explicitly built as "the retrieval target for joint-embedding ASR." The
sound→N'Ko-code direction already exists. FAC is the inverse: code→sound.
- `Desktop/NKo/nko/phonetics.py` — the IPA / tone / Unicode layer.
- N'Ko Brain Scanner (`Desktop/nko-brain-scanner`) — your released
N'Ko-adapted model: custom BPE tokenizer (2.75× compression) + finite-state
CV/CVN constraint giving 99.8–100
obvious objection** ("frontier models can't read N'Ko"): FAC's decoder is a
constrained-decoding problem on a model that already manipulates the codebook.

Honest status

- The Unicode inventory, tone prior, text-only baseline, synthetic tone seam,
fusion self-test, and pitch-channel representational studies are runnable.
- The decisive acoustic TDER is still blocked by data: a read-speech WAV aligned
to known tone-marked N'Ko text.
- The end-to-end toned-CER claim is still blocked by one checkpoint inference
pass after that read-speech input exists.
- No result is fabricated from TTS or unaligned lesson commentary.

Build

bash

cd Desktop/nko-acoustic-coding
export PATH="/Library/TeX/texbin:$PATH"
pdflatex main.tex && bibtex main && pdflatex main.tex && pdflatex main.tex

Expected: compiles clean under MacTeX pdflatex, 6 pages, references resolved,
zero undefined citations. N'Ko is shown via Unicode codepoints + Latin
transliteration + IPA (tipa) so it compiles everywhere without an N'Ko font.

Verification

bash

cd Desktop/nko-acoustic-coding/experiments
python3 -W ignore fac_implementation_scorecard.py

Or from the repo root:

bash

make verify       # FAC scorecard + paper compile
make verify-full  # also checks the local NKo Python/Swift tone layers
make test         # FAC contract tests only
make clean        # remove local LaTeX/Python build products

The scorecard refreshes `experiments/corpus/tone_prior.json`, reruns the core
scripts, scans for stale tone-inventory claims, and writes:

`experiments/artifacts/fac_implementation_scorecard.json`
`experiments/artifacts/fac_implementation_scorecard.md`

Files

`main.tex` — the paper (ACL format, matches your nko-brain-scanner conventions)
`references.bib` — LAC, brain-scanner self-cite, codecs, featural-script lit
`acl.sty`, `acl_natbib.bst` — copied from nko-brain-scanner
`main.pdf` — built output, regenerated by `make paper`
`figures/` — preview render

Next moves (ranked)

1. Record or obtain `read.wav` plus `gold_nko.txt`: the same speaker reading the
exact tone-marked N'Ko text.
2. Run `make read-speech AUDIO=/path/read.wav TEXT=/path/gold_nko.txt` for real
H1-H3 acoustic TDER. See `experiments/READ_SPEECH_RUNBOOK.md`.
3. Run the archived checkpoint once to convert the same correction into an
end-to-end toned-CER number. No retraining is needed for that pass.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-acoustic-coding/README.md

Detected Structure

Method · Evaluation · References · Figures · Code Anchors · Architecture