FAC / N'Ko Tone Experiments
This file is the human-readable summary of the runnable experiment scripts. The canonical machine-readable status is generated by:
Full Public Reader
FAC / N'Ko Tone Experiments
This file is the human-readable summary of the runnable experiment scripts. The
canonical machine-readable status is generated by:
python3 -W ignore fac_implementation_scorecard.pyThat writes:
- `artifacts/fac_implementation_scorecard.json`
- `artifacts/fac_implementation_scorecard.md`
- `corpus/tone_prior.json`
Corrected Tone Inventory
N'Ko has seven Unicode combining tone marks:
- `U+07EB` short high
- `U+07EC` short low
- `U+07ED` short rising
- `U+07EE` long descending
- `U+07EF` long high
- `U+07F0` long low
- `U+07F1` long rising
Folding length into class yields high, low, rising, falling, and unmarked mid.
The important correction is that falling is native: `U+07EE` is long descending.
The old extension interpretation is deprecated.
Corpus Prior
The current lesson corpus contains 105 entries from 20 videos, 12,541 N'Ko
characters, 3,316 tone marks, and 4,139 parsed syllables.
Current parsed tone distribution:
| class | count | share |
|---|---|---|
| low | 1,660 | 40.1 |
| mid / unmarked | 1,378 | 33.3 |
| high | 1,062 | 25.7 |
| falling | 23 | 0.6 |
| rising | 16 | 0.4 |
Aggregates:
- Marked register, high + low: 65.8
- Non-contour, high + low + mid: 99.1
- Contour, rising + falling: 0.9
The earlier register/contour headline numbers were stale and came from an older
snapshot.
Text-Only Baseline
Run:
python3 -W ignore tone_lm_baseline.pyCurrent 5-fold lesson-disjoint TDER:
| model | TDER |
|---|---|
| majority class | 58.7 |
| unigram by syllable | 51.4 |
| bigram + previous tone | 50.8 |
This remains the bar that the acoustic channel must beat on aligned read speech.
Tone Seam
Run:
python3 -W ignore tone_seam_v0.py
python3 -W ignore tone_fusion_eval.py --selftest`tone_seam_v0.py` proves the deterministic classifier mechanics on controlled
synthetic syllables and exercises the real-audio path when local parent audio is
available. `tone_fusion_eval.py --selftest` is a wiring sanity check, not a
scientific result; it verifies that text prior, acoustic classifier, and fusion
logic can run together before aligned read speech exists.
H2 Pitch-Fidelity Study
Run:
python3 -W ignore h2_pitch_fidelity.pyThis is a controlled representational study. It does not prove real speech
TDER. The corrected comparison is:
| codec | contours | tokens/event | interpretation |
|---|---|---|---|
| LAC-level | level | 1 | lexical pitch summary |
| LAC-contour | level, rising, falling | 2 | lexical register word + contour word |
| FAC-native | level, rising, falling | 1 | N'Ko native tone mark |
FAC-native and LAC-contour have the same pitch reconstruction power by design;
the difference is token cost. N'Ko writes register + contour in one glyph,
where a lexical channel needs an additional contour word. The decisive real
experiment is still the aligned read-speech TDER.
Remaining Gates
The implementation is ready for the next input, but two end-to-end claims remain
blocked:
1. A real `read.wav` aligned to known `gold_nko.txt`.
2. One archived-checkpoint inference pass to measure end-to-end toned CER.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-acoustic-coding/experiments/RESULTS.md
Detected Structure
Method · Evaluation · Code Anchors