Grand Diomande Research · Full HTML Reader

N'Ko Inscription System - Phase 2 Implementation Report

Phase 2 transforms cc-inscription from a foundational type system into a **living discipline** with: - Rigorous basin lifecycle management (split/merge with cryptographic provenance) - Graph Kernel governance for ontology operations - RAG++ as laboratory assistant for predictability evaluation - Lexicon version chain traversal and reinterpretation layer - Information-theoretic phrase emergence

Language as Infrastructure research note experiment writeup candidate score 32 .md

Full Public Reader

N'Ko Inscription System - Phase 2 Implementation Report

Executive Summary

Phase 2 transforms cc-inscription from a foundational type system into a living discipline with:
- Rigorous basin lifecycle management (split/merge with cryptographic provenance)
- Graph Kernel governance for ontology operations
- RAG++ as laboratory assistant for predictability evaluation
- Lexicon version chain traversal and reinterpretation layer
- Information-theoretic phrase emergence

Status: COMPLETE (268 tests passing)

---

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                     CC-INSCRIPTION ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐               │
│  │   Claims     │    │    Basin     │    │   Lexicon    │               │
│  │  (10 types)  │───▶│  Lifecycle   │───▶│  Versioning  │               │
│  └──────────────┘    └──────────────┘    └──────────────┘               │
│         │                   │                   │                        │
│         ▼                   ▼                   ▼                        │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐               │
│  │   Phrase     │    │  Ontology    │    │   Surface    │               │
│  │  Detection   │◀───│  Operations  │───▶│  Renderer    │               │
│  └──────────────┘    └──────────────┘    └──────────────┘               │
│         │                   │                   │                        │
│         └───────────────────┼───────────────────┘                        │
│                             ▼                                            │
│                  ┌──────────────────────┐                               │
│                  │   Integration Layer   │                               │
│                  │  (Graph Kernel, RAG++) │                              │
│                  └──────────────────────┘                               │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

---

Phase 2 Implementation Steps

Step 1: Ontology Operations Module

File: `src/ontology/mod.rs`

Central governance for all lexicon-mutating operations.

Key Structures

rust
/// Slice declaration required for any ontology operation.
pub struct SliceDeclaration {
    pub slice_id: SliceFingerprint,
    pub [sensitive field redacted],
    pub policy_id: String,
    pub evidence_turns: Vec<String>,
}

/// Ontology operation with governance metadata.
pub struct OntologyOperation<T> {
    pub op_type: OntologyOperationType,
    pub slice: SliceDeclaration,
    pub predictability_assessment: PredictabilityAssessment,
    pub payload: T,
}

/// Operation types that can mutate the lexicon.
pub enum OntologyOperationType {
    ProposeSplit(BasinId),
    ProposeMerge(Vec<BasinId>),
    ProposeRetire(BasinId),
    GraduateProto(ProtoBasinId),
    RegisterPhrase(PhraseId),
}

Design Rationale

  • Slice Governance: Every ontology change requires a slice declaration proving the operation is justified by evidence within the current temporal window
  • Predictability Assessment: Before/after metrics ensure changes improve the system's predictive consistency
  • Builder Pattern: `OntologyOperationBuilder` validates all required fields before construction

---

Step 2a: Graph Kernel Governance

File: `src/integration/graph_kernel.rs`

Integration with the graph kernel for slice-based evidence governance.

Key Structures

rust
/// Client for graph kernel slice governance.
pub struct GraphKernelClient {
    endpoint: Option<String>,
}

/// Request for an ontology-specific slice.
pub struct OntologySliceRequest {
    pub op_type: OntologyOperationType,
    pub basin_ids: Vec<BasinId>,
    pub time_window: TimeWindow,
    pub min_evidence_turns: usize,
}

/// Evidence sufficiency requirements.
pub struct EvidenceSufficiency {
    pub min_turns: usize,        // Default: 10
    pub min_time_span: f64,      // Default: 86400.0 (1 day)
    pub min_sessions: usize,     // Default: 3
}

Key Methods

  • `request_ontology_slice()`: Request a slice for ontology operations
  • `verify_slice_declaration()`: Verify a slice declaration is valid
  • `check_evidence_sufficiency()`: Ensure evidence meets minimum requirements

---

Step 2b: RAG++ Predictability Evaluation

File: `src/integration/rag.rs`

RAG++ serves as a bounded critic, measuring predictability deltas without generating ontology changes.

Key Structures

rust
/// Predictability assessment result.
pub struct PredictabilityAssessment {
    pub before: f64,
    pub after: f64,
    pub delta: f64,
    pub comparable_cases: Vec<ComparableCase>,
    pub confidence: f64,
}

/// Comparable historical case for evaluation.
pub struct ComparableCase {
    pub turn_id: String,
    pub similarity: f64,
    pub outcome: Option<String>,
}

/// Claim type distribution with Dirichlet smoothing.
pub struct ClaimTypeDistribution {
    pub counts: [u32; 10],
    pub alpha: QuantizedFloat,
}

Key Methods

  • `evaluate_predictability()`: Measure before/after predictability for proposed changes
  • `find_comparable_cases()`: Retrieve similar historical cases within slice
  • `compute_prediction_error()`: Information-theoretic error measurement
  • `kl_divergence()`: Symmetric KL divergence between claim distributions

---

Step 3: Version Chain Traversal

File: `src/lexicon/version.rs`

Enables walking lexicon history for reinterpretation and auditing.

Key Structures

rust
/// Versioned lexicon with parent chain.
pub struct VersionedLexicon {
    pub lexicon: Lexicon,
    pub parent: Option<Box<VersionedLexicon>>,
    pub children: Vec<LexiconVersion>,
    pub created_at: f64,
    pub content_hash: [u8; 32],
}

Key Methods

  • `genesis()`: Create the initial lexicon version
  • `evolve()`: Create a new version with changes applied
  • `ancestors()`: Walk the parent chain
  • `at_version()`: Find lexicon at a specific version
  • `change_path()`: Reconstruct the change sequence between versions

---

Step 4: Reinterpretation Layer

File: `src/lexicon/reinterpret.rs`

Derived view of old inscriptions under new ontology without rewriting history.

Key Structures

rust
/// Reinterpretation context with full provenance.
pub struct ReinterpretationContext {
    pub source_version: LexiconVersion,
    pub target_version: LexiconVersion,
    pub splits: Vec<SplitReinterpretation>,
    pub merges: Vec<MergeReinterpretation>,
    pub transform_hash: [u8; 32],
}

/// A reinterpreted claim preserves original provenance.
pub struct ReinterpretedClaim {
    pub original_id: InscriptionId,
    pub original_claim: Claim,
    pub reinterpreted_basin: Option<BasinId>,
    pub reason: ReinterpretationReason,
}

/// Why a claim was reinterpreted.
pub enum ReinterpretationReason {
    NoChange,
    SplitClassified { parent: BasinId, child: BasinId, classifier_hash: [u8; 32] },
    MergedInto { original: BasinId, merged_into: BasinId },
    Retired { original: BasinId, retirement_type: RetirementType },
}

Design Principle

> No Retroactive Rewriting: Old inscriptions remain untouched. Reinterpretation is a DERIVED VIEW, not new truth. The original InscriptionId is preserved.

---

Step 5: Basin Split Logic

File: `src/basin/lifecycle.rs`

Two-phase split detection: multimodality + downstream dynamics divergence.

Key Structures

rust
/// Split classification result: ChildA, ChildB, or UNCERTAIN.
pub enum SplitAssignment {
    ChildA,      // Clearly belongs to child A
    ChildB,      // Clearly belongs to child B
    Uncertain,   // Too close to boundary
}

/// Whitening transform for dimensionality reduction.
pub struct WhiteningTransform {
    pub components: Vec<Vec<QuantizedFloat>>,
    pub mean: Vec<QuantizedFloat>,
    pub n_components: usize,
    pub variance_explained: QuantizedFloat,
    pub content_hash: [u8; 32],
}

/// Enhanced split classifier with tri-state output.
pub struct EnhancedSplitClassifier {
    pub normal: Vec<QuantizedFloat>,
    pub bias: QuantizedFloat,
    pub margin: QuantizedFloat,
    pub whitening: WhiteningTransform,
    pub content_hash: [u8; 32],
}

/// Split detection result.
pub struct SplitDetection {
    pub is_multimodal: bool,
    pub modality_score: QuantizedFloat,
    pub dynamics_divergent: bool,
    pub divergence_score: QuantizedFloat,
    pub proposed_separator: Option<EnhancedSplitClassifier>,
    pub cluster_centroids: Option<[Vec<QuantizedFloat>; 2]>,
    pub cluster_assignments: Vec<SplitAssignment>,
}

Algorithms

AlgorithmPurpose
Whitening/PCADimensionality reduction before k-means (fragile in high-dim)
k-means (k=2)Bimodality detection on whitened samples
Silhouette scoreCluster separation quality
KL divergenceDynamics divergence on claim type distributions
Hyperplane separatorDeterministic classifier from centroid difference

Tri-State Classification

rust
// Points within ±margin of hyperplane are UNCERTAIN
if signed_distance > margin {
    SplitAssignment::ChildA
} else if signed_distance < -margin {
    SplitAssignment::ChildB
} else {
    SplitAssignment::Uncertain  // Honest uncertainty
}

---

Step 6: Basin Merge Logic

File: `src/basin/lifecycle.rs`

Indistinguishability test with regularized covariance.

Key Structures

rust
/// Merge detection result.
pub struct MergeDetection {
    pub is_indistinguishable: bool,
    pub distinguishability_score: QuantizedFloat,
    pub overlap_fraction: QuantizedFloat,
    pub reason: Option<MergeReason>,
}

/// Merge detector configuration with regularization.
pub struct MergeDetectorConfig {
    pub distinguishability_threshold: QuantizedFloat,
    pub min_overlap_fraction: QuantizedFloat,
    pub regularization_lambda: QuantizedFloat,  // λI for stability
    pub use_constitutional_bounds: bool,
}

/// Reason for basin merge.
pub enum MergeReason {
    SensorCoverageChanged,
    ContextsAbandoned,
    BehaviorConverged,
}

Algorithms

AlgorithmPurpose
Regularized MahalanobisDistance with λI regularization for low-sample stability
Pooled covarianceAverage of diagonal covariances
Typical set overlapAnalytical approximation based on distance/variance
Signature similarityCosine similarity on claim type distributions

Merge Criteria

A merge is warranted when:
1. Low distinguishability: Centroids close relative to pooled spread
2. High overlap: Typical sets have significant intersection
3. Same basis: Must be in the same coordinate system

---

Step 7: Phrase Emergence

File: `src/phrase/detection.rs`

Information-theoretic phrase detection using claim types as stable vocabulary.

Key Structures

rust
/// Pattern over claim types (stable 10-element vocabulary).
pub struct ClaimTypePattern {
    pub claim_types: Vec<ClaimType>,
    pub max_gap_micros: i64,
    pub content_hash: [u8; 32],
}

/// Candidate phrase with evidence provenance.
pub struct PhraseCandidate {
    pub pattern: ClaimTypePattern,
    pub frequency: u32,
    pub compression_ratio: QuantizedFloat,
    pub predictive_gain: QuantizedFloat,
    pub effect_size: QuantizedFloat,
    pub evidence_claim_ids: Vec<InscriptionId>,
}

/// Detection configuration.
pub struct PhraseDetectionConfig {
    pub min_frequency: u32,
    pub min_predictive_gain: QuantizedFloat,
    pub max_pattern_length: usize,
    pub entropy_prior_alpha: QuantizedFloat,  // Dirichlet smoothing
}

Information-Theoretic Metrics

MetricFormulaPurpose
Baseline entropyH(next_claim_type)Uncertainty without pattern
Conditional entropyH(next_claim_type \| pattern)Uncertainty after pattern
Predictive gainH_baseline - H_conditionalInformation value of pattern
Effect size(H_baseline - H_conditional) / pooled_stdPractical significance

Dirichlet Smoothing

rust
/// Entropy with Laplace smoothing (α=1.0).
pub fn entropy_from_claim_type_counts(
    counts: &ClaimTypeCounts,
    prior_alpha: QuantizedFloat,
) -> QuantizedFloat {
    let alpha = prior_alpha.to_f64();
    let total = counts.iter().sum::<u32>() as f64 + 10.0 * alpha;

    let mut h = 0.0;
    for &c in counts {
        let p = (c as f64 + alpha) / total;
        if p > 0.0 {
            h -= p * p.log2();
        }
    }
    QuantizedFloat::from_f64(h)
}

Detection Pipeline

Claims → Mine N-grams → Filter by Frequency → Calculate Predictive Gain
       → Calculate Effect Size → Rank by Effect Size → Return Candidates

---

Type System Foundation (Phase 1 Recap)

Phase 2 builds on the deterministic type system from Phase 1:

Core Types

TypePurposeKey Property
`WallTime`Human-readable timestampi64 microseconds since epoch
`MonoTicks`Monotonic orderingNever goes backwards
`Timestamp`Dual timeBoth wall + mono for safety
`QuantizedFloat`Deterministic floatsi64 mantissa, 10^-6 scale
`BasisId`Coordinate systemFull pipeline hash
`SliceFingerprint`Evidence scopeGraph kernel authority
`InscriptionId`Cryptographic commitmentSHA-256 of all provenance

The Provenance Law

> Fundamental Theorem of Replayability: For any InscriptionId, given the archived evidence, lexicon, basis, and detector config, the claim can be deterministically recomputed and the InscriptionId MUST match.

---

Test Coverage

Test Summary by Module

ModuleTestsStatus
`claims/`38
`basin/`42
`lexicon/`35
`phrase/`18
`ontology/`12
`integration/`15
`surface/`28
`types/`72
`canonical/`5
`provenance/`3
Total268

Key Test Categories

1. Split Detection Tests
- Insufficient samples handling
- Unimodal distribution (no split)
- Bimodal with divergent dynamics (split warranted)
- k-means convergence
- Silhouette coefficient calculation

2. Merge Detection Tests
- Identical constitutions (indistinguishable)
- Distinct constitutions (distinguishable)
- Different basis rejection
- Regularization stability (low variance)
- Threshold configuration

3. Phrase Detection Tests
- Entropy calculations
- Predictive gain for deterministic sequences
- Predictive gain for unpredictable sequences
- Pattern occurrence finding
- N-gram mining
- Candidate promotion

---

File Structure

core/cc-inscription/
├── Cargo.toml
├── src/
│   ├── lib.rs                          # Module exports
│   ├── claims/
│   │   ├── mod.rs                      # Claim enum, 10 types
│   │   ├── stabilize.rs                # Claim 1: ߛ
│   │   ├── disperse.rs                 # Claim 2: ߜ
│   │   ├── transition.rs               # Claim 3: ߕ
│   │   ├── return_.rs                  # Claim 4: ߙ
│   │   ├── dwell.rs                    # Claim 5: ߡ
│   │   ├── oscillate.rs                # Claim 6: ߚ
│   │   ├── recover.rs                  # Claim 7: ߞ
│   │   ├── novel.rs                    # Claim 8: ߣ
│   │   ├── place_shift.rs              # Claim 9: ߠ
│   │   └── echo.rs                     # Claim 10: ߥ
│   ├── basin/
│   │   ├── mod.rs
│   │   ├── proto.rs                    # Proto-basin state
│   │   ├── lifecycle.rs                # Split/merge/retire (Step 5-6)
│   │   ├── graduation.rs               # Proto → Basin criteria
│   │   └── constitution.rs             # Basin invariants
│   ├── lexicon/
│   │   ├── mod.rs
│   │   ├── version.rs                  # Version chain (Step 3)
│   │   ├── tokens.rs                   # Basin/Place tokens
│   │   ├── changelog.rs                # Change tracking
│   │   └── reinterpret.rs              # Reinterpretation layer (Step 4)
│   ├── ontology/
│   │   └── mod.rs                      # Ontology operations (Step 1)
│   ├── phrase/
│   │   ├── mod.rs
│   │   ├── detection.rs                # Phrase emergence (Step 7)
│   │   ├── compression.rs              # Description length
│   │   └── registration.rs             # Phrase → grammar
│   ├── integration/
│   │   ├── mod.rs
│   │   ├── graph_kernel.rs             # Slice governance (Step 2a)
│   │   ├── rag.rs                      # Predictability eval (Step 2b)
│   │   └── dell.rs                     # z-trajectory source
│   ├── surface/
│   │   ├── mod.rs
│   │   ├── renderer.rs                 # Claim → N'Ko line
│   │   ├── grammar.rs                  # Grammar skeletons
│   │   ├── slots.rs                    # Slot renderers
│   │   └── normalize.rs                # NFC + BiDi handling
│   ├── types/
│   │   ├── mod.rs                      # Public re-exports
│   │   ├── time.rs                     # Timestamp types
│   │   ├── quantized.rs                # QuantizedFloat + math
│   │   ├── basis.rs                    # BasisId, BasinConstitution
│   │   ├── evidence.rs                 # Evidence sum type
│   │   └── session.rs                  # Session segmentation
│   ├── canonical/
│   │   └── mod.rs                      # CBOR canonical serialization
│   └── provenance/
│       └── mod.rs                      # Provenance law verification
├── lexicons/
│   └── v1.0.json                       # Initial lexicon
├── docs/
│   └── PHASE2_IMPLEMENTATION.md        # This document
└── tests/
    └── (integrated in src/ as #[cfg(test)])

---

Design Decisions

D1: Tri-State Split Classification

Problem: Binary classification forces bad assignments near hyperplane boundaries.

Solution: Three-state classification with explicit "Uncertain" zone.

rust
pub enum SplitAssignment {
    ChildA,      // Confident: distance > margin
    ChildB,      // Confident: distance < -margin
    Uncertain,   // Honest uncertainty: |distance| <= margin
}

Rationale: Points in the uncertain zone may represent transitions between basins. Forcing a binary choice loses information.

D2: Claim Types as Stable Vocabulary

Problem: Basin IDs evolve with ontology changes, making historical comparisons unreliable.

Solution: Use the 10 claim types as a stable vocabulary for entropy calculations.

rust
// WRONG: Basin IDs evolve
fn divergence_old(basin_dist_a: &[BasinId], basin_dist_b: &[BasinId]) -> f64

// RIGHT: Claim types are stable
fn divergence_new(claim_dist_a: &ClaimTypeCounts, claim_dist_b: &ClaimTypeCounts) -> f64

Rationale: The 10 claim types (Stabilize, Disperse, Transition, etc.) are defined by the system architecture, not by learning. They form a stable reference frame.

D3: Dirichlet Smoothing for Sparse Data

Problem: Zero counts cause undefined entropy (log(0)).

Solution: Dirichlet prior with α=1.0 (Laplace smoothing).

rust
let p = (count as f64 + alpha) / (total + 10.0 * alpha);

Rationale: Adds pseudo-counts to prevent zero probabilities while preserving relative frequencies for large samples.

D4: Regularized Mahalanobis Distance

Problem: Low-sample regimes can produce singular covariance matrices.

Solution: Add λI regularization (default λ=10^-4).

rust
let regularized_var = variance + lambda;
let mahalanobis_component = diff_squared / regularized_var;

Rationale: Prevents numerical instability without significantly affecting distances when variance is well-estimated.

D5: Reinterpretation Without Rewriting

Problem: Ontology changes could invalidate historical inscriptions.

Solution: Reinterpretation is a derived view, not a mutation.

rust
pub struct ReinterpretedClaim {
    pub original_id: InscriptionId,  // PRESERVED
    pub original_claim: Claim,       // PRESERVED
    pub reinterpreted_basin: Option<BasinId>,  // NEW VIEW
    pub reason: ReinterpretationReason,        // WHY
}

Rationale: The provenance chain must remain intact. Reinterpretation adds a layer; it never removes the original.

---

Performance Considerations

P1: Streaming Serialization

Canonical serialization can be a bottleneck. Structure code to enable streaming:

rust
pub struct StreamingCanonicalSerializer {
    hasher: Sha256,
    writer: Vec<u8>,
}

P2: Epoch Boundaries

Long lexicon chains slow verification. Introduce epoch anchors:

rust
pub struct LexiconEpoch {
    pub epoch_number: u64,
    pub lexicon_hash: [u8; 32],
    pub committed_at: Timestamp,
    pub parent_epoch_hash: [u8; 32],
}

P3: Delta-Encoded Timestamps

Microsecond timestamps generate large logs. Use delta encoding:

rust
pub struct CompressedSessionLog {
    pub base_timestamp: Timestamp,
    pub tick_deltas: Vec<u8>,  // Varint encoded
}

---

Future Work

Phase 3: Live Operation

1. Sensor Jitter Alignment: Align timestamps to monotonic clock before z-segment computation
2. Pure Verification Sandbox: Ensure verification is referentially transparent (no I/O)
3. RAG++ Integration: Connect to live RAG++ service for predictability evaluation

Extensions

1. Full PCA Implementation: Currently using identity whitening; implement actual PCA
2. Temporal Pattern Constraints: Enforce max_gap_micros in pattern matching
3. Evidence Provenance: Populate evidence_claim_ids in phrase candidates

---

Conclusion

Phase 2 completes the transformation of cc-inscription from a type system into a living discipline capable of:

  • Detecting when basins should split or merge based on empirical evidence
  • Governing ontology changes through slice-based evidence requirements
  • Preserving provenance through reinterpretation without rewriting
  • Discovering phrase-level patterns through information-theoretic metrics

The system is now ready for integration with live sensor data and the broader Comp-Core ecosystem.

---

Document generated: 2026-01-04
Tests: 268 passing
Phase: 2 COMPLETE

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/core/semantic/cc-inscription/docs/PHASE2_IMPLEMENTATION.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture