NKO-1.3 Complete — `nko.transliterate` Canonical Engine
`nko/transliterate.py` — the canonical, unified transliteration engine for the N'Ko Unified Platform. Consolidates **6 scattered implementations** into one authoritative module.
Full Public Reader
NKO-1.3 Complete — `nko.transliterate` Canonical Engine
Status: ✅ DONE
Date: 2025-07-19
Tests: 58/58 passing (0.04s)
---
What Was Built
`nko/transliterate.py` — the canonical, unified transliteration engine for the N'Ko Unified Platform. Consolidates 6 scattered implementations into one authoritative module.
Implementations Consolidated
| # | Source | Type | Status |
|---|---|---|---|
| 1 | `core/transliteration/` (Bridge + NkoHandler + ArabicHandler + LatinHandler) | Python — IPA intermediary arch | Primary source — most structured |
| 2 | `core/prediction/translation_bridge.py` | Python — keyboard-ai with dictionary + SQLite | Superseded (duplicate transliteration logic) |
| 3 | `tools/telegram-bot/bridge_core.py` | Python — standalone fallback bridge | Superseded |
| 4 | `tools/pwa/js/bridge-core.js` | JavaScript — client-side bridge | Reference only (JS, different runtime) |
| 5 | `core/audio/phoneme.py` | Python — PhonemeMapper with char maps | Phoneme maps referenced |
| 6 | `core/pipeline/stages.py` | Python — TransliterationStage wrapping Bridge | Pipeline consumer, not source |
Canonical source chosen: `core/transliteration/` — best architecture (IPA intermediary), most complete character maps, cleanest separation of concerns. Enhanced with corrections from other implementations.
API Surface
Module-level functions (quick usage)
from nko.transliterate import transliterate, detect_script, convert_all, batch, to_ipa, analyze
# Auto-detect → Latin
transliterate("ߒߞߏ") # → "nkɔ"
# Explicit source/target
transliterate("baka", target="nko") # → "ߓߊߞߊ"
transliterate("سلام", target="latin") # → "slam"
# Script detection
detect_script("ߒߞߏ") # → Script.NKO
# All scripts at once
convert_all("ߒߞߏ") # → {"nko": "ߒߞߏ", "latin": "nkɔ", "arabic": "نكو"}
# Batch
batch(["ߒߞߏ", "ߓߊ"], target="latin") # → [TranslitResult(...), ...]
# IPA intermediary
to_ipa("ߒߞߏ") # → "nkɔ"
# Analysis
analyze("ߒߞߏ abc") # → {"dominant": "nko", "counts": {...}, ...}Class API (full control)
from nko.transliterate import NkoTransliterator, Script, TranslitResult
t = NkoTransliterator()
result = t.convert("ߒߞߏ", source=Script.NKO, target=Script.LATIN)
# TranslitResult(source_text='ߒߞߏ', source_script=Script.NKO,
# target_text='nkɔ', target_script=Script.LATIN,
# ipa='nkɔ', confidence=1.0)Exported character maps (for phonetics integration)
from nko.transliterate import (
NKO_TO_IPA, IPA_TO_NKO, IPA_TO_LATIN, IPA_TO_ARABIC,
ARABIC_TO_IPA, NKO_VOWELS_TO_IPA, NKO_CONSONANTS_TO_IPA,
NKO_TONE_MARKS, NKO_DIGITS_TO_WESTERN,
)Scripts Supported
| Direction | Status |
|---|---|
| N'Ko → Latin | ✅ Full (7 vowels, 19+ consonants, digits, tone marks, punctuation) |
| Latin → N'Ko | ✅ Full (single chars + digraphs: ny, ng, gb, ch, sh, dj, rr) |
| N'Ko → Arabic | ✅ Via IPA intermediary |
| Arabic → N'Ko | ✅ Via IPA intermediary |
| Arabic → Latin | ✅ Full Arabic consonants + vowel diacritics |
| Latin → Arabic | ✅ Via IPA intermediary |
| Script detection | ✅ Unicode-range voting (NKO: U+07C0-07FF, Arabic: U+0600-06FF+) |
N'Ko Character Coverage
- 7 vowels: ߊ(a) ߋ(o) ߌ(i) ߍ(e) ߎ(u) ߏ(ɔ) ߐ(ɛ)
- 19 consonants: ߒ(n) ߓ(b) ߔ(p) ߕ(t) ߖ(dʒ) ߗ(tʃ) ߘ(d) ߙ(r) ߚ(rr) ߛ(s) ߜ(gb) ߝ(f) ߞ(k) ߟ(l) ߡ(m) ߢ(ɲ) ߣ(n) ߤ(h) ߥ(w) ߦ(j) ߧ(ŋ)
- 3 alternates: ߠ(na) ߨ(p) ߩ(r) ߪ(s)
- 10 digits: ߀-߉
- 5 tone marks: ߫(high) ߬(low) ߭(falling) ߮(rising) ߯(long)
- 4 punctuation: ߸(,) ߹(.) ߷(!) ߺ(-)
- 2 combining: ߲(nasalization) ߳(tilde)
Test Results
58 passed in 0.04s
Test classes:
TestScriptDetection ......... 11 tests (detect NKO/Latin/Arabic/empty/mixed/extended, is_* helpers)
TestNkoToLatin .............. 9 tests (basic word, vowels, consonants, digraphs, nasals, spaces, digits, convenience, auto-detect)
TestLatinToNko .............. 4 tests (basic, digraphs, spaces, convenience)
TestNkoArabic ............... 3 tests (N'Ko→Arabic, Arabic→N'Ko, Arabic→Latin)
TestIPA ..................... 3 tests (N'Ko→IPA, Latin→IPA, convenience)
TestEdgeCases ............... 8 tests (empty, identity, whitespace, tones, punctuation, mixed, invalid script, long vowel)
TestBatchAndConvertAll ...... 5 tests (batch, empty batch, convert_all, convenience functions)
TestTranslitResult .......... 5 tests (str, repr, IPA, confidence, frozen)
TestRoundTrip ............... 3 tests (simple CV, consonant cluster, vowels)
TestAnalyze ................. 3 tests (NKO text, mixed, convenience)
TestCharacterMaps ........... 4 tests (vowels complete, consonants complete, reverse map, digits)Architecture
Input Text
│
▼
detect_script() ──→ Script.NKO / Script.LATIN / Script.ARABIC
│
▼
_to_ipa(text, source_script)
│ NKO: char-by-char lookup in NKO_TO_IPA
│ Latin: digraph-first matching (ny,ng,gb,ch...) then single chars
│ Arabic: char-by-char lookup in ARABIC_TO_IPA
│
▼
IPA String (phonetic intermediary)
│
▼
_from_ipa(ipa, target_script)
│ Latin: longest-match against IPA_TO_LATIN (dʒ→j, tʃ→c, ŋ→ng...)
│ NKO: longest-match against IPA_TO_NKO
│ Arabic: longest-match against IPA_TO_ARABIC
│
▼
Target TextIntegration with nko.phonetics
The module exports all character maps as public symbols. When `nko.phonetics` (NKO-1.2) is ready:
from nko.transliterate import NKO_TO_IPA, NKO_VOWELS_TO_IPA, NKO_CONSONANTS_TO_IPA, NKO_TONE_MARKSThese are the single source of truth for N'Ko → IPA mappings across the entire platform.
Files Modified/Created
- Created: `nko/transliterate.py` (22.7 KB — canonical engine)
- Created: `tests/test_transliterate.py` (15.6 KB — 58 tests)
- Updated: `nko/__init__.py` (clean imports)
- Created: `NKO-1.3-COMPLETE.md` (this file)
Key Design Decisions
1. IPA intermediary — all conversions route through IPA. This ensures phonetic accuracy and makes adding new scripts trivial (just add IPA↔NewScript maps).
2. ɔ and ɛ preserved in Latin output — Manding Latin orthography uses ɔ and ɛ. We don't lossy-compress to "o"/"e".
3. Longest-match for multi-char tokens — digraphs (ny, ng, gb, dʒ, tʃ) are matched before single chars to avoid ambiguity.
4. Frozen TranslitResult — immutable dataclass prevents accidental mutation.
5. Module-level singleton — `_DEFAULT_ENGINE` avoids re-initialization cost for the convenience functions.
6. Character maps exported — enables downstream modules (phonetics, audio, prediction) to use the same maps.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
NKo/NKO-1.3-COMPLETE.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture