Grand Diomande Research · Full HTML Reader

FAC / N'Ko Tone Experiments

This file is the human-readable summary of the runnable experiment scripts. The canonical machine-readable status is generated by:

Language as Infrastructure experiment experiment writeup candidate score 24 .md

Full Public Reader

FAC / N'Ko Tone Experiments

This file is the human-readable summary of the runnable experiment scripts. The
canonical machine-readable status is generated by:

bash
python3 -W ignore fac_implementation_scorecard.py

That writes:

  • `artifacts/fac_implementation_scorecard.json`
  • `artifacts/fac_implementation_scorecard.md`
  • `corpus/tone_prior.json`

Corrected Tone Inventory

N'Ko has seven Unicode combining tone marks:

  • `U+07EB` short high
  • `U+07EC` short low
  • `U+07ED` short rising
  • `U+07EE` long descending
  • `U+07EF` long high
  • `U+07F0` long low
  • `U+07F1` long rising

Folding length into class yields high, low, rising, falling, and unmarked mid.
The important correction is that falling is native: `U+07EE` is long descending.
The old extension interpretation is deprecated.

Corpus Prior

The current lesson corpus contains 105 entries from 20 videos, 12,541 N'Ko
characters, 3,316 tone marks, and 4,139 parsed syllables.

Current parsed tone distribution:

classcountshare
low1,66040.1
mid / unmarked1,37833.3
high1,06225.7
falling230.6
rising160.4

Aggregates:

  • Marked register, high + low: 65.8
  • Non-contour, high + low + mid: 99.1
  • Contour, rising + falling: 0.9

The earlier register/contour headline numbers were stale and came from an older
snapshot.

Text-Only Baseline

Run:

bash
python3 -W ignore tone_lm_baseline.py

Current 5-fold lesson-disjoint TDER:

modelTDER
majority class58.7
unigram by syllable51.4
bigram + previous tone50.8

This remains the bar that the acoustic channel must beat on aligned read speech.

Tone Seam

Run:

bash
python3 -W ignore tone_seam_v0.py
python3 -W ignore tone_fusion_eval.py --selftest

`tone_seam_v0.py` proves the deterministic classifier mechanics on controlled
synthetic syllables and exercises the real-audio path when local parent audio is
available. `tone_fusion_eval.py --selftest` is a wiring sanity check, not a
scientific result; it verifies that text prior, acoustic classifier, and fusion
logic can run together before aligned read speech exists.

H2 Pitch-Fidelity Study

Run:

bash
python3 -W ignore h2_pitch_fidelity.py

This is a controlled representational study. It does not prove real speech
TDER. The corrected comparison is:

codeccontourstokens/eventinterpretation
LAC-levellevel1lexical pitch summary
LAC-contourlevel, rising, falling2lexical register word + contour word
FAC-nativelevel, rising, falling1N'Ko native tone mark

FAC-native and LAC-contour have the same pitch reconstruction power by design;
the difference is token cost. N'Ko writes register + contour in one glyph,
where a lexical channel needs an additional contour word. The decisive real
experiment is still the aligned read-speech TDER.

Remaining Gates

The implementation is ready for the next input, but two end-to-end claims remain
blocked:

1. A real `read.wav` aligned to known `gold_nko.txt`.
2. One archived-checkpoint inference pass to measure end-to-end toned CER.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

nko-acoustic-coding/experiments/RESULTS.md

Detected Structure

Method · Evaluation · Code Anchors