Grand Diomande Research · Full HTML Reader

FAC / N'Ko Tone Experiments

This file is the human-readable summary of the runnable experiment scripts. The canonical machine-readable status is generated by:

Language as Infrastructure experiment experiment writeup candidate score 24 .md

Full Public Reader

FAC / N'Ko Tone Experiments

This file is the human-readable summary of the runnable experiment scripts. The
canonical machine-readable status is generated by:

bash

python3 -W ignore fac_implementation_scorecard.py

That writes:

`artifacts/fac_implementation_scorecard.json`
`artifacts/fac_implementation_scorecard.md`
`corpus/tone_prior.json`

Corrected Tone Inventory

N'Ko has seven Unicode combining tone marks:

`U+07EB` short high
`U+07EC` short low
`U+07ED` short rising
`U+07EE` long descending
`U+07EF` long high
`U+07F0` long low
`U+07F1` long rising

Folding length into class yields high, low, rising, falling, and unmarked mid.
The important correction is that falling is native: `U+07EE` is long descending.
The old extension interpretation is deprecated.

Corpus Prior

The current lesson corpus contains 105 entries from 20 videos, 12,541 N'Ko
characters, 3,316 tone marks, and 4,139 parsed syllables.

Current parsed tone distribution:

class	count	share
low	1,660	40.1
mid / unmarked	1,378	33.3
high	1,062	25.7
falling	23	0.6
rising	16	0.4

Aggregates:

Marked register, high + low: 65.8
Non-contour, high + low + mid: 99.1
Contour, rising + falling: 0.9

The earlier register/contour headline numbers were stale and came from an older
snapshot.

Text-Only Baseline

Run:

bash

python3 -W ignore tone_lm_baseline.py

Current 5-fold lesson-disjoint TDER:

model	TDER
majority class	58.7
unigram by syllable	51.4
bigram + previous tone	50.8

This remains the bar that the acoustic channel must beat on aligned read speech.

Tone Seam

Run:

bash

python3 -W ignore tone_seam_v0.py
python3 -W ignore tone_fusion_eval.py --selftest

`tone_seam_v0.py` proves the deterministic classifier mechanics on controlled
synthetic syllables and exercises the real-audio path when local parent audio is
available. `tone_fusion_eval.py --selftest` is a wiring sanity check, not a
scientific result; it verifies that text prior, acoustic classifier, and fusion
logic can run together before aligned read speech exists.

H2 Pitch-Fidelity Study

Run:

bash

python3 -W ignore h2_pitch_fidelity.py

This is a controlled representational study. It does not prove real speech
TDER. The corrected comparison is:

codec	contours	tokens/event	interpretation
LAC-level	level	1	lexical pitch summary
LAC-contour	level, rising, falling	2	lexical register word + contour word
FAC-native	level, rising, falling	1	N'Ko native tone mark

FAC-native and LAC-contour have the same pitch reconstruction power by design;
the difference is token cost. N'Ko writes register + contour in one glyph,
where a lexical channel needs an additional contour word. The decisive real
experiment is still the aligned read-speech TDER.

Remaining Gates

The implementation is ready for the next input, but two end-to-end claims remain
blocked:

1. A real `read.wav` aligned to known `gold_nko.txt`.
2. One archived-checkpoint inference pass to measure end-to-end toned CER.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-acoustic-coding/experiments/RESULTS.md

Detected Structure

Method · Evaluation · Code Anchors