Back to corpus
research noteexperiment writeup candidatescore 24

NKO-4.2 COMPLETE — CoreML Prediction Model from Corpus

Trained an interpolated n-gram language model from the N'Ko corpus, exported it to CoreML format, created Swift integration code, and evaluated prediction quality. The model provides real-time next-word prediction for the N'Ko keyboard.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

Trained an interpolated n-gram language model from the N'Ko corpus, exported it to CoreML format, created Swift integration code, and evaluated prediction quality. The model provides real-time next-word prediction for the N'Ko keyboard. ### Interpolated Trigram Model (Primary) - **Type:** Interpolated trigram/bigram/unigram with add-k smoothing - **Formula:** `P(w|w₋₂, w₋₁) = λ₃·P_tri + λ₂·P_bi + λ₁·P_uni` - **Weights:** λ₃=0.55, λ₂=0.30, λ₁=0.15, add_k=0.01 - **Rationale:** With ~4.7K sentences / ~34K tokens, a well-tuned n-gram model outperforms any neural approach. The interpolation provides graceful backoff from trigram → bigram → unigram for unseen contexts. ### CoreML Neural Model (Secondary) - **Type:** Embedding + Dense + Softmax - **Input:** Two word indices (bigram context) - **Architecture:** 2×Embedding(3001→32) → Concat(64) → Dense(64→3000) → Softmax - **Training:** 20 epochs SGD distillation from n-gram statistics - **Size:** 1.5 MB (.mlmodel) | Metric | Value | |--------|-------| | Total sentences | 4,650 | | Sentences ≥2 words | 3,377 | | Total tokens | 33,932 | | Vocabulary size | 7,364 | | Unique bigram contexts | 7,363 | | Unique trigram contexts | 20,869 | | Train/test split | 90%/10% (3,039/338) | | Metric | Value | |--------|-------| | **Top-1 Accuracy** | **13.4%** | | **Top-3 Accuracy** | **22.9%** | | Top-5 Accuracy | 27.8% | | Top-10 Accuracy | 35.6% | | Mean Reciprocal Rank (MRR) | 0.1965 | | Perplexity (test) | 673.59 | | Perplexity (train sample) | 73.87 |

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.