Back to corpus
research noteexperiment writeup candidatescore 22

When Unicode Is Not Understanding

The characters render. The cursor moves right to left. The prompt accepts the input. The model returns something confident. From the outside, the system looks as if it can read.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

Most people think the machine has crossed the line once the script appears on the screen. The characters render. The cursor moves right to left. The prompt accepts the input. The model returns something confident. From the outside, the system looks as if it can read. N'Ko makes the difference visible because it is not a broken script, not an informal spelling habit, and not a niche encoding trick. It is a real writing system designed for Manding languages, standardized in Unicode from U+07C0 through U+07FF, used in books, education, religious texts, newspapers, keyboards, and online writing. If the machine fails on N'Ko, the failure is not that N'Ko is ill-formed. The failure is that the machine's world is incomplete. That was the first experiment: look inside the model and ask whether N'Ko has a real internal life there. The method was simple in spirit. Take a large language model. Feed it matched text in scripts the model should process. Then extract the hidden states layer by layer, the internal vectors the model builds before it ever speaks.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.