CC-Core / CC-Collection Integration Benchmark Report

Full HTML reader

Read the full artifact

Extracted abstract or opening context

This document compares the performance characteristics of the legacy `SensorDataset`/`SensorDataLoader` pipeline against the new `MotionDataset`/`MotionDataLoader` pipeline integrated with `cc_collection`. | Metric | Legacy Pipeline | New Pipeline | Improvement | |--------|-----------------|--------------|-------------| | Data Loading (NPZ) | Baseline | ~1.05x | +5% overhead (type validation) | | Batch Collation | Baseline | ~0.95x | 5% faster (optimized numpy ops) | | Memory per Sample | 100 bytes (25D) | 100 bytes (25D) | No change | | Type Safety | Runtime checks | Compile-time + Runtime | Stronger guarantees | | Session Handling | Manual | Automatic boundaries | Reduced data leakage | - **Small Dataset**: 10,000 frames, 1 session - **Medium Dataset**: 100,000 frames, 10 sessions - **Large Dataset**: 1,000,000 frames, 50 sessions 1. **Data Loading**: Time to load dataset from NPZ file 2. **Iteration**: Time to iterate through entire dataset 3. **Batch Collation**: Time to collate batches 4. **Memory Usage**: Peak memory during operations 5. **Type Validation**: Overhead of validation checks **Characteristics**: - Fast loading (no validation) - No dtype enforcement - No shape validation - Silent failures on malformed data

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.