Grand Diomande Research · Full HTML Reader

NKO-1.2 Complete — `nko.phonetics` Module

Replaced the 4-line stub at `nko/phonetics.py` with a comprehensive **820-line** unified phonetics module that consolidates IPA mappings, tone handling, character classification, and Unicode utilities from 13+ scattered implementations across the codebase.

Language as Infrastructure research note experiment writeup candidate score 24 .md

Full Public Reader

NKO-1.2 Complete — `nko.phonetics` Module

Status: ✅ COMPLETE
Date: 2025-07-19
Tests: 100/100 passed (0.07s)

---

What Was Built

Replaced the 4-line stub at `nko/phonetics.py` with a comprehensive 820-line unified phonetics module that consolidates IPA mappings, tone handling, character classification, and Unicode utilities from 13+ scattered implementations across the codebase.

### Source Files Analyzed
| File | What it contributed |
|------|-------------------|
| `core/audio/phoneme.py` | PhonemeMapper, NKO_CONSONANTS, NKO_VOWELS, NKO_TONES, IPA mappings |
| `core/transliteration/nko.py` | NkoHandler — full NKO_TO_IPA map, IPA_TO_NKO reverse, LATIN_TO_NKO, validation |
| `core/transliteration/bridge.py` | Script detection logic (Unicode ranges) |
| `core/audio/pronunciation.py` | PronunciationEngine — syllabification, difficulty estimation |
| `core/prediction/prosody_engine.py` | NKO_VOWELS/NKO_CONSONANTS sets, ToneType enum, NKO_HIGH_TONE/LOW_TONE constants |
| `core/prediction/tts_engine.py` | PhonemeMapping dataclass, TonePattern enum, dialect awareness |
| `tools/sound-sigils/definitions.py` | SigilDefinition data (N'Ko char → semantic/audio mappings) |
| `data/nko-unified.json` | 232 canonical records — characters (vowels/consonants/digits/tone_marks/punctuation), vocabulary, morphology, cognates, proverbs |

---

API Summary

Main Class: `NKoPhonetics`

python

from nko.phonetics import NKoPhonetics

ph = NKoPhonetics()                    # Full instance (loads JSON)
ph = NKoPhonetics(load_json=False)     # Lightweight (tables only)

### Unicode Range Utilities
| Method | Returns | Example |
|--------|---------|---------|
| `is_nko_char(ch)` | `bool` | `is_nko_char('ߞ') → True` |
| `is_nko_text(text)` | `bool` | `is_nko_text('ߒߞߏ') → True` |
| `nko_purity(text)` | `float` | `nko_purity('ߒabc') → 0.25` |

### Character Classification
| Method | Returns | Example |
|--------|---------|---------|
| `classify(ch)` | `CharCategory` | `classify('ߞ') → CONSONANT` |
| `is_vowel(ch)` | `bool` | `is_vowel('ߊ') → True` |
| `is_consonant(ch)` | `bool` | `is_consonant('ߞ') → True` |
| `is_letter(ch)` | `bool` | `is_letter('ߊ') → True` |
| `is_tone_mark(ch)` | `bool` | `is_tone_mark('߫') → True` |
| `is_combining(ch)` | `bool` | `is_combining('߲') → True` |
| `is_digit(ch)` | `bool` | `is_digit('߁') → True` |
| `is_punctuation(ch)` | `bool` | `is_punctuation('߹') → True` |

### IPA Conversion
| Method | Returns | Example |
|--------|---------|---------|
| `to_ipa(text)` | `str` | `to_ipa('ߒߞߏ') → 'nkɔ'` |
| `to_ipa(text, include_tones=True)` | `str` | Includes IPA diacritics for tones |
| `char_to_ipa(ch)` | `str` | `char_to_ipa('ߞ') → 'k'` |
| `to_phonemes(text)` | `List[Phoneme]` | List with symbol, source_char, tone, duration |
| `ipa_to_nko(ipa)` | `str` | `ipa_to_nko('nkɔ') → 'ߣߞߏ'` |

### Tone Handling
| Method | Returns | Example |
|--------|---------|---------|
| `get_tone(ch)` | `ToneType?` | `get_tone('߫') → ToneType.HIGH` |
| `strip_tones(text)` | `str` | `strip_tones('ߊ߫ߓ') → 'ߊߓ'` |
| `extract_tones(text)` | `List[(pos, ToneType)]` | Position + type for every tone mark |
| `has_tone_marks(text)` | `bool` | `has_tone_marks('ߊ߫') → True` |

### Character Info
| Method | Returns | Example |
|--------|---------|---------|
| `get_char_info(ch)` | `CharInfo?` | Full record: char, code, name, ipa, category, etc. |
| `get_all_chars()` | `Dict[str, CharInfo]` | All 56 cataloged N'Ko characters |
| `pronunciation_guide(text)` | `List[dict]` | Per-character guide with IPA, hints |

### Digit Utilities
| Method | Returns | Example |
|--------|---------|---------|
| `nko_digit_value(ch)` | `int?` | `nko_digit_value('߃') → 3` |
| `int_to_nko_digits(n)` | `str` | `int_to_nko_digits(42) → '߄߂'` |

### Other
| Method | Returns | Example |
|--------|---------|---------|
| `detect_script(text)` | `str` | `'nko'`, `'arabic'`, `'latin'`, `'mixed'` |
| `syllabify_ipa(ipa)` | `List[str]` | `syllabify_ipa('baba') → ['ba', 'ba']` |

Constants & Sets

python

NKO_BLOCK_START = 0x07C0
NKO_BLOCK_END = 0x07FF
UNICODE_RANGE = (0x07C0, 0x07FF)
VOWEL_CHARS       # frozenset, 7 chars
CONSONANT_CHARS   # frozenset, 26 chars
LETTER_CHARS      # frozenset, 33 chars (vowels + consonants)
DIGIT_CHARS       # frozenset, 10 chars
TONE_MARK_CHARS   # frozenset, 5 chars
COMBINING_CHARS   # frozenset, 2 chars (nasalization)
PUNCTUATION_CHARS # frozenset, 6 chars
ALL_NKO_CHARS     # frozenset, 56 chars total
IPA_VOWELS        # frozenset of IPA vowel symbols

### Enums
- `ToneType`: HIGH, LOW, RISING, LONG, VERY_LONG, NASAL, NASAL_ALT, MID, UNKNOWN
- `CharCategory`: VOWEL, CONSONANT, DIGIT, TONE_MARK, COMBINING, PUNCTUATION, OTHER

### Data Classes
- `CharInfo` (frozen): char, codepoint, code, name, category, ipa, latin, note, tone_type, digit_value, punctuation_eq
- `Phoneme` (frozen): symbol, source_char, audio_hint, duration_ms, tone

Module-level Singleton

python

from nko.phonetics import IPA
IPA.to_ipa('ߒߞߏ')  # works immediately, no instantiation needed

---

Test Results

100 passed in 0.07s

### Test Coverage by Area (16 test classes, 100 tests):
1. Unicode Range (3 tests) — constants validation
2. is_nko_char / is_nko_text (10 tests) — char detection, edge cases
3. Classification (15 tests) — all categories, set counts, boolean helpers
4. CharInfo Lookup (8 tests) — all category types, unknown chars, immutability
5. IPA Conversion (12 tests) — forward conversion, tones, digraphs, spaces, empty
6. IPA Reverse (4 tests) — ipa_to_nko, digraphs, round-trip, spaces
7. Tone Handling (11 tests) — all 5 tone types + nasal, strip, extract, has_tone
8. Phoneme Generation (5 tests) — basic, source tracking, tone attachment, duration
9. Syllabification (4 tests) — CV, CVCV, edge cases
10. Script Detection (5 tests) — N'Ko, Latin, Arabic, empty, mixed
11. Digit Utilities (5 tests) — value lookup, conversion, zero, negative
12. Purity (3 tests) — pure, empty, mixed
13. Data Loading (4 tests) — JSON present/absent, vocabulary/proverbs access
14. Pronunciation Guide (2 tests) — basic, spaces
15. Singleton (4 tests) — existence, functionality, repr
16. Edge Cases (5 tests) — set sizes, N in consonants, exhaustive classification, passthrough

---

Known Gaps / Future Work

1. ipa_to_nko round-trip for consonants with variants — chars like ߒ (N Long Leg) vs ߣ (Na) both map to IPA 'n'. Reverse defaults to ߣ (primary). The N'Ko ≠ IPA is inherently lossy for these variant pairs.
2. Syllabification is simplified (CV model). Manding has more complex codas (CVN, CVCC in loanwords). The prosody engine in `core/prediction/prosody_engine.py` has richer syllabification that could be integrated in NKO-1.3+.
3. Dialect-specific IPA — some consonants have different pronunciations in Bambara vs Maninka vs Jula. Current mappings are "standard N'Ko" (pan-Manding). Dialect layer could be added.
4. Arabic script support — `core/audio/phoneme.py` has Arabic IPA mappings. These aren't in the unified module yet (scope was N'Ko only). Could add in a future `nko.phonetics.arabic` submodule.
5. No U+07F0/U+07F1 — these codepoints in the N'Ko combining range aren't assigned in Unicode. Handled correctly (not in our tables = classified as OTHER if tested).

---

## Files Modified/Created
- `nko/phonetics.py` — REPLACED stub with 820-line production module
- `tests/test_phonetics.py` — CREATED 100 unit tests (18KB)
- `NKO-1.2-COMPLETE.md` — this file

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

NKo/NKO-1.2-COMPLETE.md

Detected Structure

Method · Evaluation · Code Anchors