N'Ko Inscription System - Phase 2 Implementation Report
Phase 2 transforms cc-inscription from a foundational type system into a **living discipline** with: - Rigorous basin lifecycle management (split/merge with cryptographic provenance) - Graph Kernel governance for ontology operations - RAG++ as laboratory assistant for predictability evaluation - Lexicon version chain traversal and reinterpretation layer - Information-theoretic phrase emergence
Full Public Reader
N'Ko Inscription System - Phase 2 Implementation Report
Executive Summary
Phase 2 transforms cc-inscription from a foundational type system into a living discipline with:
- Rigorous basin lifecycle management (split/merge with cryptographic provenance)
- Graph Kernel governance for ontology operations
- RAG++ as laboratory assistant for predictability evaluation
- Lexicon version chain traversal and reinterpretation layer
- Information-theoretic phrase emergence
Status: COMPLETE (268 tests passing)
---
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ CC-INSCRIPTION ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Claims │ │ Basin │ │ Lexicon │ │
│ │ (10 types) │───▶│ Lifecycle │───▶│ Versioning │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Phrase │ │ Ontology │ │ Surface │ │
│ │ Detection │◀───│ Operations │───▶│ Renderer │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └───────────────────┼───────────────────┘ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Integration Layer │ │
│ │ (Graph Kernel, RAG++) │ │
│ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘---
Phase 2 Implementation Steps
Step 1: Ontology Operations Module
File: `src/ontology/mod.rs`
Central governance for all lexicon-mutating operations.
Key Structures
/// Slice declaration required for any ontology operation.
pub struct SliceDeclaration {
pub slice_id: SliceFingerprint,
pub [sensitive field redacted],
pub policy_id: String,
pub evidence_turns: Vec<String>,
}
/// Ontology operation with governance metadata.
pub struct OntologyOperation<T> {
pub op_type: OntologyOperationType,
pub slice: SliceDeclaration,
pub predictability_assessment: PredictabilityAssessment,
pub payload: T,
}
/// Operation types that can mutate the lexicon.
pub enum OntologyOperationType {
ProposeSplit(BasinId),
ProposeMerge(Vec<BasinId>),
ProposeRetire(BasinId),
GraduateProto(ProtoBasinId),
RegisterPhrase(PhraseId),
}Design Rationale
- Slice Governance: Every ontology change requires a slice declaration proving the operation is justified by evidence within the current temporal window
- Predictability Assessment: Before/after metrics ensure changes improve the system's predictive consistency
- Builder Pattern: `OntologyOperationBuilder` validates all required fields before construction
---
Step 2a: Graph Kernel Governance
File: `src/integration/graph_kernel.rs`
Integration with the graph kernel for slice-based evidence governance.
Key Structures
/// Client for graph kernel slice governance.
pub struct GraphKernelClient {
endpoint: Option<String>,
}
/// Request for an ontology-specific slice.
pub struct OntologySliceRequest {
pub op_type: OntologyOperationType,
pub basin_ids: Vec<BasinId>,
pub time_window: TimeWindow,
pub min_evidence_turns: usize,
}
/// Evidence sufficiency requirements.
pub struct EvidenceSufficiency {
pub min_turns: usize, // Default: 10
pub min_time_span: f64, // Default: 86400.0 (1 day)
pub min_sessions: usize, // Default: 3
}Key Methods
- `request_ontology_slice()`: Request a slice for ontology operations
- `verify_slice_declaration()`: Verify a slice declaration is valid
- `check_evidence_sufficiency()`: Ensure evidence meets minimum requirements
---
Step 2b: RAG++ Predictability Evaluation
File: `src/integration/rag.rs`
RAG++ serves as a bounded critic, measuring predictability deltas without generating ontology changes.
Key Structures
/// Predictability assessment result.
pub struct PredictabilityAssessment {
pub before: f64,
pub after: f64,
pub delta: f64,
pub comparable_cases: Vec<ComparableCase>,
pub confidence: f64,
}
/// Comparable historical case for evaluation.
pub struct ComparableCase {
pub turn_id: String,
pub similarity: f64,
pub outcome: Option<String>,
}
/// Claim type distribution with Dirichlet smoothing.
pub struct ClaimTypeDistribution {
pub counts: [u32; 10],
pub alpha: QuantizedFloat,
}Key Methods
- `evaluate_predictability()`: Measure before/after predictability for proposed changes
- `find_comparable_cases()`: Retrieve similar historical cases within slice
- `compute_prediction_error()`: Information-theoretic error measurement
- `kl_divergence()`: Symmetric KL divergence between claim distributions
---
Step 3: Version Chain Traversal
File: `src/lexicon/version.rs`
Enables walking lexicon history for reinterpretation and auditing.
Key Structures
/// Versioned lexicon with parent chain.
pub struct VersionedLexicon {
pub lexicon: Lexicon,
pub parent: Option<Box<VersionedLexicon>>,
pub children: Vec<LexiconVersion>,
pub created_at: f64,
pub content_hash: [u8; 32],
}Key Methods
- `genesis()`: Create the initial lexicon version
- `evolve()`: Create a new version with changes applied
- `ancestors()`: Walk the parent chain
- `at_version()`: Find lexicon at a specific version
- `change_path()`: Reconstruct the change sequence between versions
---
Step 4: Reinterpretation Layer
File: `src/lexicon/reinterpret.rs`
Derived view of old inscriptions under new ontology without rewriting history.
Key Structures
/// Reinterpretation context with full provenance.
pub struct ReinterpretationContext {
pub source_version: LexiconVersion,
pub target_version: LexiconVersion,
pub splits: Vec<SplitReinterpretation>,
pub merges: Vec<MergeReinterpretation>,
pub transform_hash: [u8; 32],
}
/// A reinterpreted claim preserves original provenance.
pub struct ReinterpretedClaim {
pub original_id: InscriptionId,
pub original_claim: Claim,
pub reinterpreted_basin: Option<BasinId>,
pub reason: ReinterpretationReason,
}
/// Why a claim was reinterpreted.
pub enum ReinterpretationReason {
NoChange,
SplitClassified { parent: BasinId, child: BasinId, classifier_hash: [u8; 32] },
MergedInto { original: BasinId, merged_into: BasinId },
Retired { original: BasinId, retirement_type: RetirementType },
}Design Principle
> No Retroactive Rewriting: Old inscriptions remain untouched. Reinterpretation is a DERIVED VIEW, not new truth. The original InscriptionId is preserved.
---
Step 5: Basin Split Logic
File: `src/basin/lifecycle.rs`
Two-phase split detection: multimodality + downstream dynamics divergence.
Key Structures
/// Split classification result: ChildA, ChildB, or UNCERTAIN.
pub enum SplitAssignment {
ChildA, // Clearly belongs to child A
ChildB, // Clearly belongs to child B
Uncertain, // Too close to boundary
}
/// Whitening transform for dimensionality reduction.
pub struct WhiteningTransform {
pub components: Vec<Vec<QuantizedFloat>>,
pub mean: Vec<QuantizedFloat>,
pub n_components: usize,
pub variance_explained: QuantizedFloat,
pub content_hash: [u8; 32],
}
/// Enhanced split classifier with tri-state output.
pub struct EnhancedSplitClassifier {
pub normal: Vec<QuantizedFloat>,
pub bias: QuantizedFloat,
pub margin: QuantizedFloat,
pub whitening: WhiteningTransform,
pub content_hash: [u8; 32],
}
/// Split detection result.
pub struct SplitDetection {
pub is_multimodal: bool,
pub modality_score: QuantizedFloat,
pub dynamics_divergent: bool,
pub divergence_score: QuantizedFloat,
pub proposed_separator: Option<EnhancedSplitClassifier>,
pub cluster_centroids: Option<[Vec<QuantizedFloat>; 2]>,
pub cluster_assignments: Vec<SplitAssignment>,
}Algorithms
| Algorithm | Purpose |
|---|---|
| Whitening/PCA | Dimensionality reduction before k-means (fragile in high-dim) |
| k-means (k=2) | Bimodality detection on whitened samples |
| Silhouette score | Cluster separation quality |
| KL divergence | Dynamics divergence on claim type distributions |
| Hyperplane separator | Deterministic classifier from centroid difference |
Tri-State Classification
// Points within ±margin of hyperplane are UNCERTAIN
if signed_distance > margin {
SplitAssignment::ChildA
} else if signed_distance < -margin {
SplitAssignment::ChildB
} else {
SplitAssignment::Uncertain // Honest uncertainty
}---
Step 6: Basin Merge Logic
File: `src/basin/lifecycle.rs`
Indistinguishability test with regularized covariance.
Key Structures
/// Merge detection result.
pub struct MergeDetection {
pub is_indistinguishable: bool,
pub distinguishability_score: QuantizedFloat,
pub overlap_fraction: QuantizedFloat,
pub reason: Option<MergeReason>,
}
/// Merge detector configuration with regularization.
pub struct MergeDetectorConfig {
pub distinguishability_threshold: QuantizedFloat,
pub min_overlap_fraction: QuantizedFloat,
pub regularization_lambda: QuantizedFloat, // λI for stability
pub use_constitutional_bounds: bool,
}
/// Reason for basin merge.
pub enum MergeReason {
SensorCoverageChanged,
ContextsAbandoned,
BehaviorConverged,
}Algorithms
| Algorithm | Purpose |
|---|---|
| Regularized Mahalanobis | Distance with λI regularization for low-sample stability |
| Pooled covariance | Average of diagonal covariances |
| Typical set overlap | Analytical approximation based on distance/variance |
| Signature similarity | Cosine similarity on claim type distributions |
Merge Criteria
A merge is warranted when:
1. Low distinguishability: Centroids close relative to pooled spread
2. High overlap: Typical sets have significant intersection
3. Same basis: Must be in the same coordinate system
---
Step 7: Phrase Emergence
File: `src/phrase/detection.rs`
Information-theoretic phrase detection using claim types as stable vocabulary.
Key Structures
/// Pattern over claim types (stable 10-element vocabulary).
pub struct ClaimTypePattern {
pub claim_types: Vec<ClaimType>,
pub max_gap_micros: i64,
pub content_hash: [u8; 32],
}
/// Candidate phrase with evidence provenance.
pub struct PhraseCandidate {
pub pattern: ClaimTypePattern,
pub frequency: u32,
pub compression_ratio: QuantizedFloat,
pub predictive_gain: QuantizedFloat,
pub effect_size: QuantizedFloat,
pub evidence_claim_ids: Vec<InscriptionId>,
}
/// Detection configuration.
pub struct PhraseDetectionConfig {
pub min_frequency: u32,
pub min_predictive_gain: QuantizedFloat,
pub max_pattern_length: usize,
pub entropy_prior_alpha: QuantizedFloat, // Dirichlet smoothing
}Information-Theoretic Metrics
| Metric | Formula | Purpose |
|---|---|---|
| Baseline entropy | H(next_claim_type) | Uncertainty without pattern |
| Conditional entropy | H(next_claim_type \| pattern) | Uncertainty after pattern |
| Predictive gain | H_baseline - H_conditional | Information value of pattern |
| Effect size | (H_baseline - H_conditional) / pooled_std | Practical significance |
Dirichlet Smoothing
/// Entropy with Laplace smoothing (α=1.0).
pub fn entropy_from_claim_type_counts(
counts: &ClaimTypeCounts,
prior_alpha: QuantizedFloat,
) -> QuantizedFloat {
let alpha = prior_alpha.to_f64();
let total = counts.iter().sum::<u32>() as f64 + 10.0 * alpha;
let mut h = 0.0;
for &c in counts {
let p = (c as f64 + alpha) / total;
if p > 0.0 {
h -= p * p.log2();
}
}
QuantizedFloat::from_f64(h)
}Detection Pipeline
Claims → Mine N-grams → Filter by Frequency → Calculate Predictive Gain
→ Calculate Effect Size → Rank by Effect Size → Return Candidates---
Type System Foundation (Phase 1 Recap)
Phase 2 builds on the deterministic type system from Phase 1:
Core Types
| Type | Purpose | Key Property |
|---|---|---|
| `WallTime` | Human-readable timestamp | i64 microseconds since epoch |
| `MonoTicks` | Monotonic ordering | Never goes backwards |
| `Timestamp` | Dual time | Both wall + mono for safety |
| `QuantizedFloat` | Deterministic floats | i64 mantissa, 10^-6 scale |
| `BasisId` | Coordinate system | Full pipeline hash |
| `SliceFingerprint` | Evidence scope | Graph kernel authority |
| `InscriptionId` | Cryptographic commitment | SHA-256 of all provenance |
The Provenance Law
> Fundamental Theorem of Replayability: For any InscriptionId, given the archived evidence, lexicon, basis, and detector config, the claim can be deterministically recomputed and the InscriptionId MUST match.
---
Test Coverage
Test Summary by Module
| Module | Tests | Status |
|---|---|---|
| `claims/` | 38 | ✅ |
| `basin/` | 42 | ✅ |
| `lexicon/` | 35 | ✅ |
| `phrase/` | 18 | ✅ |
| `ontology/` | 12 | ✅ |
| `integration/` | 15 | ✅ |
| `surface/` | 28 | ✅ |
| `types/` | 72 | ✅ |
| `canonical/` | 5 | ✅ |
| `provenance/` | 3 | ✅ |
| Total | 268 | ✅ |
Key Test Categories
1. Split Detection Tests
- Insufficient samples handling
- Unimodal distribution (no split)
- Bimodal with divergent dynamics (split warranted)
- k-means convergence
- Silhouette coefficient calculation
2. Merge Detection Tests
- Identical constitutions (indistinguishable)
- Distinct constitutions (distinguishable)
- Different basis rejection
- Regularization stability (low variance)
- Threshold configuration
3. Phrase Detection Tests
- Entropy calculations
- Predictive gain for deterministic sequences
- Predictive gain for unpredictable sequences
- Pattern occurrence finding
- N-gram mining
- Candidate promotion
---
File Structure
core/cc-inscription/
├── Cargo.toml
├── src/
│ ├── lib.rs # Module exports
│ ├── claims/
│ │ ├── mod.rs # Claim enum, 10 types
│ │ ├── stabilize.rs # Claim 1: ߛ
│ │ ├── disperse.rs # Claim 2: ߜ
│ │ ├── transition.rs # Claim 3: ߕ
│ │ ├── return_.rs # Claim 4: ߙ
│ │ ├── dwell.rs # Claim 5: ߡ
│ │ ├── oscillate.rs # Claim 6: ߚ
│ │ ├── recover.rs # Claim 7: ߞ
│ │ ├── novel.rs # Claim 8: ߣ
│ │ ├── place_shift.rs # Claim 9: ߠ
│ │ └── echo.rs # Claim 10: ߥ
│ ├── basin/
│ │ ├── mod.rs
│ │ ├── proto.rs # Proto-basin state
│ │ ├── lifecycle.rs # Split/merge/retire (Step 5-6)
│ │ ├── graduation.rs # Proto → Basin criteria
│ │ └── constitution.rs # Basin invariants
│ ├── lexicon/
│ │ ├── mod.rs
│ │ ├── version.rs # Version chain (Step 3)
│ │ ├── tokens.rs # Basin/Place tokens
│ │ ├── changelog.rs # Change tracking
│ │ └── reinterpret.rs # Reinterpretation layer (Step 4)
│ ├── ontology/
│ │ └── mod.rs # Ontology operations (Step 1)
│ ├── phrase/
│ │ ├── mod.rs
│ │ ├── detection.rs # Phrase emergence (Step 7)
│ │ ├── compression.rs # Description length
│ │ └── registration.rs # Phrase → grammar
│ ├── integration/
│ │ ├── mod.rs
│ │ ├── graph_kernel.rs # Slice governance (Step 2a)
│ │ ├── rag.rs # Predictability eval (Step 2b)
│ │ └── dell.rs # z-trajectory source
│ ├── surface/
│ │ ├── mod.rs
│ │ ├── renderer.rs # Claim → N'Ko line
│ │ ├── grammar.rs # Grammar skeletons
│ │ ├── slots.rs # Slot renderers
│ │ └── normalize.rs # NFC + BiDi handling
│ ├── types/
│ │ ├── mod.rs # Public re-exports
│ │ ├── time.rs # Timestamp types
│ │ ├── quantized.rs # QuantizedFloat + math
│ │ ├── basis.rs # BasisId, BasinConstitution
│ │ ├── evidence.rs # Evidence sum type
│ │ └── session.rs # Session segmentation
│ ├── canonical/
│ │ └── mod.rs # CBOR canonical serialization
│ └── provenance/
│ └── mod.rs # Provenance law verification
├── lexicons/
│ └── v1.0.json # Initial lexicon
├── docs/
│ └── PHASE2_IMPLEMENTATION.md # This document
└── tests/
└── (integrated in src/ as #[cfg(test)])---
Design Decisions
D1: Tri-State Split Classification
Problem: Binary classification forces bad assignments near hyperplane boundaries.
Solution: Three-state classification with explicit "Uncertain" zone.
pub enum SplitAssignment {
ChildA, // Confident: distance > margin
ChildB, // Confident: distance < -margin
Uncertain, // Honest uncertainty: |distance| <= margin
}Rationale: Points in the uncertain zone may represent transitions between basins. Forcing a binary choice loses information.
D2: Claim Types as Stable Vocabulary
Problem: Basin IDs evolve with ontology changes, making historical comparisons unreliable.
Solution: Use the 10 claim types as a stable vocabulary for entropy calculations.
// WRONG: Basin IDs evolve
fn divergence_old(basin_dist_a: &[BasinId], basin_dist_b: &[BasinId]) -> f64
// RIGHT: Claim types are stable
fn divergence_new(claim_dist_a: &ClaimTypeCounts, claim_dist_b: &ClaimTypeCounts) -> f64Rationale: The 10 claim types (Stabilize, Disperse, Transition, etc.) are defined by the system architecture, not by learning. They form a stable reference frame.
D3: Dirichlet Smoothing for Sparse Data
Problem: Zero counts cause undefined entropy (log(0)).
Solution: Dirichlet prior with α=1.0 (Laplace smoothing).
let p = (count as f64 + alpha) / (total + 10.0 * alpha);Rationale: Adds pseudo-counts to prevent zero probabilities while preserving relative frequencies for large samples.
D4: Regularized Mahalanobis Distance
Problem: Low-sample regimes can produce singular covariance matrices.
Solution: Add λI regularization (default λ=10^-4).
let regularized_var = variance + lambda;
let mahalanobis_component = diff_squared / regularized_var;Rationale: Prevents numerical instability without significantly affecting distances when variance is well-estimated.
D5: Reinterpretation Without Rewriting
Problem: Ontology changes could invalidate historical inscriptions.
Solution: Reinterpretation is a derived view, not a mutation.
pub struct ReinterpretedClaim {
pub original_id: InscriptionId, // PRESERVED
pub original_claim: Claim, // PRESERVED
pub reinterpreted_basin: Option<BasinId>, // NEW VIEW
pub reason: ReinterpretationReason, // WHY
}Rationale: The provenance chain must remain intact. Reinterpretation adds a layer; it never removes the original.
---
Performance Considerations
P1: Streaming Serialization
Canonical serialization can be a bottleneck. Structure code to enable streaming:
pub struct StreamingCanonicalSerializer {
hasher: Sha256,
writer: Vec<u8>,
}P2: Epoch Boundaries
Long lexicon chains slow verification. Introduce epoch anchors:
pub struct LexiconEpoch {
pub epoch_number: u64,
pub lexicon_hash: [u8; 32],
pub committed_at: Timestamp,
pub parent_epoch_hash: [u8; 32],
}P3: Delta-Encoded Timestamps
Microsecond timestamps generate large logs. Use delta encoding:
pub struct CompressedSessionLog {
pub base_timestamp: Timestamp,
pub tick_deltas: Vec<u8>, // Varint encoded
}---
Future Work
Phase 3: Live Operation
1. Sensor Jitter Alignment: Align timestamps to monotonic clock before z-segment computation
2. Pure Verification Sandbox: Ensure verification is referentially transparent (no I/O)
3. RAG++ Integration: Connect to live RAG++ service for predictability evaluation
Extensions
1. Full PCA Implementation: Currently using identity whitening; implement actual PCA
2. Temporal Pattern Constraints: Enforce max_gap_micros in pattern matching
3. Evidence Provenance: Populate evidence_claim_ids in phrase candidates
---
Conclusion
Phase 2 completes the transformation of cc-inscription from a type system into a living discipline capable of:
- Detecting when basins should split or merge based on empirical evidence
- Governing ontology changes through slice-based evidence requirements
- Preserving provenance through reinterpretation without rewriting
- Discovering phrase-level patterns through information-theoretic metrics
The system is now ready for integration with live sensor data and the broader Comp-Core ecosystem.
---
Document generated: 2026-01-04
Tests: 268 passing
Phase: 2 COMPLETE
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/core/semantic/cc-inscription/docs/PHASE2_IMPLEMENTATION.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture