N'Ko Phonemic Substrate Validation Report
This report freezes the evidence state before turning the current work into a paper. It separates what is mechanically validated, what is empirically supported, what failed, and what remains a hypothesis.
Full Public Reader
N'Ko Phonemic Substrate Validation Report
Date: 2026-06-01
This report freezes the evidence state before turning the current work into a paper.
It separates what is mechanically validated, what is empirically supported, what
failed, and what remains a hypothesis.
Terminology
The correct term in this package is N'Ko. "NKL" was only a conversational typo;
it is not used as a technical term here.
Unicode uses the formal block/script spelling NKo because apostrophes are not
allowed in Unicode character and block names. The research text uses N'Ko for the
script/language tradition and NKo only when referring to Unicode identifiers.
External Anchors Checked
- Unicode Core Specification, Chapter 19, section 19.4.1, documents N'Ko as a
right-to-left, phonetic script with seven vowels, tone marks, nasalization, and
diacritics used for foreign sounds.
Source: https://www.unicode.org/versions/latest/core-spec/chapter-19/
- Unicode Table 19-3 documents concrete foreign-sound combinations, including
mappings involving U+07ED and U+07F3 for sounds such as [v], [theta], [esh],
[ezh], [schwa], and French [y].
Source: https://www.unicode.org/versions/latest/core-spec/chapter-19/
- ScriptSource independently summarizes N'Ko as a phonemic alphabet with 19
consonants, 7 vowels, 8 diacritics, and later combining marks for foreign sounds.
Source: https://scriptsource.org/scr/Nkoo
- Library of Congress romanization notes list foreign-sound and diacritic handling
for N'Ko cataloging practice.
Source: https://www.loc.gov/catdir/cpso/romanization/N
These sources validate the key standards claim: extending N'Ko with diacritics for
foreign sounds is not an invented premise. The paper's internal full-compositional
layer is still a computational encoding, not an official orthographic reform.
Mechanically Validated Claims
1. Representation coverage can cross the 90
Implemented evaluator:
`[home]/Desktop/NKo/scripts/evaluate_phoneme_coverage.py`
Reusable module:
`[home]/Desktop/NKo/nko/phonemic_extensions.py`
Verified command:
cd [home]/Desktop/NKo
python3 scripts/evaluate_phoneme_coverage.py --threshold 0.90Result:
language layer covered coverage
manding baseline 27/27 100.0%
manding unicode_extensions 27/27 100.0%
manding full_compositional 27/27 100.0%
french baseline 23/36 63.9%
french unicode_extensions 29/36 80.6%
french full_compositional 36/36 100.0%
english baseline 24/41 58.5%
english unicode_extensions 30/41 73.2%
english full_compositional 41/41 100.0%Interpretation: for the tested Manding, French, and English phoneme inventories, a
deterministic N'Ko extension layer reaches at least 90
representation result, not an ASR accuracy result.
2. The implementation is tested
Verified command:
cd [home]/Desktop/NKo
python3 -m pytest -q tests/test_phonemic_extensions.py tests/test_phonetics.py tests/test_transliterate.pyResult:
164 passed in 0.16sThe tests enforce extension coverage, composition examples, and non-regression of the
existing phonetics/transliteration tests.
3. IPA to N'Ko label generation is runnable
The bundle generated concrete label examples such as:
theta i-small eng k -> ߛ߳ߌ߳ߧߞ
v epsilon rhotic i -> ߝ߭ߐߙ߳ߌ
French y -> ߎ߳
French nasal vowel -> vowel + ߲This validates that text/IPA-to-N'Ko label construction is a deterministic pipeline.
It does not validate acoustic recognition.
Operational label harness:
`[home]/Desktop/nko-brain-scanner/experiments/phonemic_substrate/label_ipa_corpus.py`
Verified default-sample result:
rows: 4
layer: full_compositional
coverage: 1.0
covered symbols: 29/29
unknown symbols: 0Empirically Supported Claims
1. Ungoverned N'Ko generation is harmful on the 500-row AGP pilot
Archived artifacts:
`[home]/Desktop/nko-brain-scanner/artifacts/agp_pilot/`
Bundle summary:
`[home]/Desktop/nko-brain-scanner/artifacts/phonemic_substrate/overnight_bundle_2026-06-01/bundle_summary.md`
Old adapter:
ASR CER: 0.3106
Blind proposal CER: 0.4701
Blind delta: +15.94pp
Direct better/same/worse: 21/169/310Interpretation: a generic N'Ko-emitting LLM adapter can produce valid-looking N'Ko
while moving away from the reference. Script validity is not correction correctness.
2. AGP governance neutralizes most harm but is not sufficient
Old adapter, gated:
Gated CER: 0.3120
Gated delta: +0.14pp
Accepted: 23/500
Accepted worse: 18Interpretation: the gate reduced a +15.94pp blind catastrophe to a +0.14pp near-wash.
That supports the governance thesis, while the 18 accepted-worse rows prevent any
"perfect safety" claim.
3. Minimal-edit SFT did not close the loop
Minimal-edit adapter:
Blind proposal CER: 0.4269
Blind delta: +11.63pp
Direct better/same/worse: 14/225/261
Gated CER: 0.3156
Gated delta: +0.50pp
Accepted: 89/500
Accepted worse: 69Interpretation: the SFT model learned smaller edits and more refusals, but small wrong
edits passed the edit-size gate. Loop-v1 failed usefully: the next gate must score
evidence/correctness, not edit size alone.
4. Oracle headroom shows the architecture could improve if proposals become trustworthy
Oracle facts from the full bridge:
Rows: 29,060
Accepted at cap=2: 3,302
Rejected: 25,758
edit_too_large rejections: 22,914
Median rejected edit size: 9
Accepted-worse: 0Cap sweep, oracle proposals:
cap=2 -0.46pp
cap=8 -5.57pp
cap=12 -9.97pp
cap=999 -29.15ppSame cap sweep, real proposals:
cap=2 +0.14pp
cap=8 +2.36pp
cap=12 +3.99pp
cap=999 +15.59ppInterpretation: proposal correctness is the bottleneck. The edit budget is protective
while proposals are bad and useful only after proposals become trustworthy.
Claims That Remain Hypotheses
1. Direct cross-lingual audio to N'Ko without retraining
The representation layer does not require retraining. Direct ASR usually does.
Claim boundary:
text/IPA -> N'Ko labels: deterministic, no retraining
audio -> phoneme labels: requires an acoustic recognizer
audio -> N'Ko directly: requires ASR training/adaptation unless the model is featuralThe zero-shot acoustic claim becomes plausible only if the acoustic model predicts
features such as place, manner, voicing, vowel height, rounding, nasality, and tone.
That is the FAC hypothesis. It is not yet validated by the current artifacts.
2. Phrase-level transfer through N'Ko
The idea that phrase structures can move across languages through a shared N'Ko
phonemic/semantic layer is promising, but currently conceptual. It needs a separate
evaluation:
source phrase -> phonemic/semantic representation -> N'Ko substrate -> target languageThe representation layer can carry sounds. Meaning transfer needs semantic alignment
and language-specific generation.
3. A self-improving correction loop that lowers CER
The infrastructure exists:
ASR -> proposer -> gate -> decisions -> SFT data -> new proposerBut loop-v1 did not lower CER. The next version needs:
- correctness/evidence gate;
- proposal confidence or likelihood ratio;
- acoustic support checks;
- edit localization, not just edit size;
- stricter handling of uncertain/boundary partitions.
Paper-Safe Interpretation
The strongest accurate claim is:
> N'Ko is an extensible phonemic substrate whose representation coverage can be
> raised mechanically using documented foreign-sound diacritics plus bounded internal
> composition; this enables automatic construction of N'Ko phonemic labels for
> cross-lingual ASR training, while the current AGP experiments show why any
> generative correction loop must be governed by evidence, not left to an LLM.
The claim to avoid:
> We can recognize any language in N'Ko without retraining.
Correct replacement:
> We can represent many languages in N'Ko without retraining; recognizing them from
> audio requires either ASR adaptation or a validated featural acoustic model.
Reproducible Artifacts
- Coverage artifact:
`[home]/Desktop/nko-brain-scanner/artifacts/phonemic_coverage/coverage_2026-06-01.md`
- Overnight bundle:
`[home]/Desktop/nko-brain-scanner/artifacts/phonemic_substrate/overnight_bundle_2026-06-01/bundle_summary.md`
- Bundle JSON:
`[home]/Desktop/nko-brain-scanner/artifacts/phonemic_substrate/overnight_bundle_2026-06-01/bundle_summary.json`
- IPA label harness:
`[home]/Desktop/nko-brain-scanner/experiments/phonemic_substrate/label_ipa_corpus.py`
- IPA label report:
`[home]/Desktop/nko-brain-scanner/artifacts/phonemic_substrate/overnight_bundle_2026-06-01/label_examples_report.json`
- AGP pilot artifacts:
`[home]/Desktop/nko-brain-scanner/artifacts/agp_pilot/`
- Paper proposal:
`[home]/Desktop/nko-brain-scanner/NKO-PHONEMIC-SUBSTRATE-PROPOSAL.md`
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
nko-brain-scanner/NKO-VALIDATION-REPORT.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture