The Script That Machines Can't Read: Adapting Large Language Models for N'Ko
This preprint studies script invisibility in modern language models and asks what adaptation recovers when the training distribution barely contains the writing system. The target is N'Ko, but the argument generalizes to scripts that are present in Unicode yet absent from model competence.
Paper workspace
Live draft structure
Artifacts
Draft PDF
Rendered draft for the script-invisibility line. Treat as a live manuscript, not fixed publication copy.
Open artifactFinal split-paper render
Final split-paper artifact from the N'Ko paper release set. Still treated as editable public draft copy here.
Open artifactEditable source
Submission-ready draft exists, but it should remain editable until venue packaging is chosen.
Source anchors
nko-brain-scanner/paper/final/01-script-invisibility/paper.tex
nko-brain-scanner/paper/current/paper1_dead_circuits.tex
nko-brain-scanner/paper/current/paper3_cross_model.tex
Method tags
Ingest intersections
Status
Submission-ready.
Key claims
01
Unicode support is not the same thing as model visibility.
02
Script adaptation should be measured internally, not only by output fluency.
03
N'Ko is a useful stress test for low-resource script competence.
Public reading note
Ready to attach once you choose the public preprint release path.
Standard skeleton
What this paper must keep proving
problem
A script can be present in Unicode while remaining functionally invisible to model internals.
method
Probe activation behavior and adaptation effects when models encounter N'Ko text.
implementation
Model-family probes, adaptation runs, and activation-profile comparisons.
data
N'Ko text probes and controlled adaptation examples. Release must preserve dataset provenance.
evaluation
Output behavior plus internal activation evidence, because fluency alone is not competence.
references
Tokenizer coverage, multilingual representation learning, activation patching, low-resource scripts.
openQuestions
Which failures are tokenizer-level, pretraining-distribution-level, or downstream alignment artifacts.
Checkpoints and references
Proof chain
Claim checkpoint
central-claim slot
Every central claim must point to a proof anchor or remain labeled as speculative.
Implementation checkpoint
implementation-map slot
Every method should identify the code path, harness, schema, or protocol that embodies it.
Evidence checkpoint
evidence-manifest slot
Every reported result should point to run IDs, packet IDs, data snapshots, commits, or review artifacts.
Reference checkpoint
references slot
Every external claim should resolve to a cited paper, benchmark, standard, or documented prior system.
Release checkpoint
release-gate slot
Every PDF needs a named condition before it can move from draft to citation-ready.