Grand Diomande Research · Full HTML Reader

NKO-2.4 COMPLETE — NKoPrediction: Predictive Text Engine

**Task:** NKO-2.4 — Build NKoPrediction — predictive text engine in Swift with CoreML stub **Status:** ✅ COMPLETE **Date:** 2026-02-19 **Wave:** 2 (FINAL TASK)

Language as Infrastructure research note experiment writeup candidate score 24 .md

Full Public Reader

NKO-2.4 COMPLETE — NKoPrediction: Predictive Text Engine

Task: NKO-2.4 — Build NKoPrediction — predictive text engine in Swift with CoreML stub
Status: ✅ COMPLETE
Date: 2026-02-19
Wave: 2 (FINAL TASK)

---

Summary

NKoPrediction is the unified predictive text engine that powers the N'Ko keyboard. It orchestrates 6 sub-engines into a single latency-budgeted pipeline that produces context-aware, morphology-informed, culturally-sensitive word and phrase predictions for Manding languages in N'Ko script.

2,039 lines of Swift source across 7 files, with 43 tests (0 failures).

---

Architecture

┌─────────────────────────────────────────────────┐
│             NKoPredictionEngine                  │
│  (orchestrator — merges, deduplicates, re-ranks) │
├─────────────┬──────────────┬────────────────────┤
│ Frequency   │ ContextEngine│ SmartCompose        │
│ Predictor   │ (topic +     │ (full-phrase        │
│ (ngram +    │  grammar     │  completions,       │
│  lexicon)   │  boosts)     │  greeting protocol) │
├─────────────┼──────────────┼────────────────────┤
│ NKoMorphology              │ NKoPhonetics        │
│ (root+suffix expansion)    │ (IPA similarity     │
│                            │  fallback)          │
├─────────────┴──────────────┴────────────────────┤
│            CoreML Slot (stub)                    │
│  NKoLanguageModelProvider protocol               │
│  → pluggable when .mlmodelc is trained           │
└─────────────────────────────────────────────────┘

Prediction Pipeline (per keystroke)

1. FrequencyPredictor — Prefix match against 40+ lexicon entries + n-gram continuation
2. ContextEngine — Topic detection (8 categories) + Manding SOV grammar analysis → boost scores
3. MorphologyExpander — Root extraction via NKoMorphology → suffix expansion candidates
4. PhoneticFallback — IPA Levenshtein similarity via NKoPhonetics (when <3 candidates)
5. CoreML — Neural inference (when model loaded; stub no-ops for now)
6. Merge & Rank — Deduplicate, apply context boosts, filter by minScore, return top-K

---

Files Created

FileLinesRole
`NKoPrediction.swift`415Main engine orchestrator, pipeline, merge/rank
`FrequencyPredictor.swift`312Lexicon (40+ entries), n-gram seeding, prefix/next-word prediction
`ContextEngine.swift`289Topic detection (8 categories), grammar analysis (SOV), contextual boosts
`SmartCompose.swift`317Full-phrase completions: greetings, blessings, proverbs, welfare chains, responses
`CoreMLSlot.swift`159`NKoLanguageModelProvider` protocol + `CoreMLStub` no-op + `CoreMLStatus`
`PredictionResult.swift`301`PredictionCandidate`, `PredictionRequest/Response`, enums (TopicCategory, GrammarExpectation, ComposeIntent, ComposeSuggestion)
`NGramModel.swift`246Thread-safe n-gram model (bi/trigram), backoff, prefix search, vocabulary tracking
Total source2,039
`PredictionTests.swift`35843 tests across 7 test classes

---

Public API Summary

`NKoPredictionEngine` (main entry point)

swift
let engine = NKoPredictionEngine()

// Word prediction
let response = engine.predict("ߒ ߓߍ߬")
response.candidates        // [PredictionCandidate] ranked by finalScore
response.detectedTopic     // .casual, .family, .religion, ...
response.expectedNextType  // .verb (after TAM particle)
response.latencySeconds    // < 0.01s typical

// Quick API
let words = engine.topPredictions("ߒ ߓߍ߬", count: 3) // [String]

// Smart compose
let phrases = engine.smartSuggest("ߌ ߣߌ", limit: 3) // [ComposeSuggestion]
let responses = engine.responseSuggestions(for: "ߌ ߣߌ ߛߐ߲߬ߜߐ߫ߡߊ")

// Learning
engine.learn(text: "ߒ ߧߴ ߌ ߝߏ߫")

// CoreML (future)
engine.registerMLProvider(myTrainedModel)
engine.coreMLStatus // CoreMLStatus

// Session
engine.resetSession()

`NGramModel`

swift
let model = NGramModel(order: 3)
model.train(text: "ߒ ߓߍ߬ ߕߊ߯")
model.predict(context: "ߒ ߓߍ߬", topK: 5) // [(word, probability)]
model.wordsWithPrefix("ߕ", topK: 5)
model.unigramProbability(of: "ߒ")
model.vocabulary       // Set<String>
model.vocabularySize   // Int

`ContextEngine`

swift
let ctx = ContextEngine()
ctx.detectTopic("ߘߋ߲ ߡߎ߬ߛߏ") // (.family, 0.65)
ctx.analyzeGrammar("ߒ ߓߍ߬")    // .verb
ctx.getBoosts(for: "ߒ ߓߍ߬")    // [word: boost_score]
ctx.timeGreeting()               // (nko: "ߌ ߣߌ ߕߟߋ", english: "Good afternoon")

`CoreMLSlot` (future integration)

swift
protocol NKoLanguageModelProvider {
    var isReady: Bool { get }
    func predict(context: [String], topK: Int) -> [PredictionCandidate]
    func embeddings(for word: String) -> [Float]?
    func similarity(between: String, and: String) -> Double?  // default impl
}

---

Cross-Module Integration

DependencyHow Used
NKoMorphology`analyzeWord()` → root extraction; `listSuffixes()` → morpheme expansion candidates
NKoPhonetics`toIPA()` → phonetic representation; Levenshtein similarity for phonetic fallback

---

Test Results

Test Suite 'All tests' passed at 2026-02-19
  Executed 339 tests, with 0 failures (0 unexpected) in 0.472 seconds

NKoPrediction-specific tests:
  NGramModelTests:           6 tests ✅
  FrequencyPredictorTests:   6 tests ✅
  ContextEngineTests:        8 tests ✅
  SmartComposeTests:         5 tests ✅
  CoreMLSlotTests:           4 tests ✅
  PredictionResultTests:     3 tests ✅
  NKoPredictionEngineTests: 11 tests ✅
  ─────────────────────────────────
  Total:                    43 tests, 0 failures

Test Coverage

  • NGramModel: train/predict, unigram probability, vocabulary, prefix matching, reset, backoff
  • FrequencyPredictor: lexicon loading (40+ entries), prefix prediction, next-word, Latin lookup, incremental learning, empty input
  • ContextEngine: topic detection (family, religion), grammar analysis (pronoun→TAM, TAM→verb, verb→object), boosts, time greeting, conversation history
  • SmartCompose: greeting completion, blessing completion, proverb completion, response suggestions, blessing acknowledgment ("Amen")
  • CoreML: stub not-ready, empty predictions, nil embeddings, status description
  • PredictionResult: score clamping, contextBoost multiplication, Comparable conformance
  • Engine integration: predict after TAM, empty input, latency <1s, topPredictions, smartSuggest, learn→improve, CoreML status, reset session, isNKo, topic detection metadata, topic hint override

---

Design Decisions

1. No SQLite — Kept everything in-memory for keyboard responsiveness. The Python engine used SQLite for persistence; Swift version defers persistence to the iOS app layer (UserDefaults or lightweight file I/O).

2. Thread-safe NGramModel — Uses `NSLock` for concurrent access from keyboard extension threads. Dictionary-based storage (not arrays) for O(1) context lookup.

3. Katz-style backoff — Trigram model backs off to bigram when no trigram match. Graceful degradation with limited training data.

4. CoreML as protocol — `NKoLanguageModelProvider` is a clean protocol that the stub satisfies. When a trained .mlmodelc arrives, just conform a wrapper and call `registerMLProvider()`. No code changes needed in the engine.

5. Manding SOV grammar awareness — The context engine knows that after a subject pronoun (ߒ, ߌ, ߊ߬) you expect a TAM particle or verb, after TAM (ߓߍ߬, ߞߊ߬) you expect a verb, and after a verb you expect an object or postposition. This grammar model directly maps to Manding's isolating + auxiliary-particle structure.

6. Levenshtein on IPA, not N'Ko — Phonetic similarity computed on IPA representations (via NKoPhonetics.toIPA()) rather than raw Unicode. This captures actual phonetic distance, not visual similarity.

7. Smart-compose culturally grounded — Greeting protocol follows the traditional Manding multi-turn welfare inquiry chain. Proverb completion is built-in. Blessing responses include "ߊ߬ߡߌ߬ߣߊ߬" (Amen).

---

Wave 2 Final Status

TaskModuleLinesTestsStatus
NKO-2.1NKoPhonetics1,58671
NKO-2.2NKoTransliteration1,198116
NKO-2.3NKoMorphology2,44774
NKO-2.4NKoPrediction2,03943
NKO-2.5NKoCulture1,22734
TotalNKoCore Swift Package8,497339✅ WAVE 2 COMPLETE

All 339 tests pass. Zero failures. `swift build` and `swift test` clean.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

NKo/NKO-2.4-COMPLETE.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture