Grand Diomande Research · Full HTML Reader

Motion Training Data Pipeline Protocol

> **Version**: 1.0 > **Status**: DRAFT > **Scope**: End-to-end data architecture for motion capture, processing, and training

Embodied Trajectory Systems architecture technical paper candidate score 56 .md

Full Public Reader

Motion Training Data Pipeline Protocol

> Version: 1.0
> Status: DRAFT
> Scope: End-to-end data architecture for motion capture, processing, and training

---

I. Overview

This document defines the complete data pipeline from sensor capture to ML training corpus. The system is designed as an infinite training grind - a perpetual motion data collection environment that:

1. Captures motion from multiple sensor sources
2. Synchronizes with audio/phrase playback
3. Processes and normalizes data in real-time
4. Stores for immediate and future training
5. Supports upstream (generation) and downstream (analysis) tasks

---

II. Sensor Sources & Data Formats

II.1 Supported Sensor Types

SourceData TypeFrequencyLandmarksPriority
MediaPipe (Webcam)Holistic pose30 fps33 pose + 21×2 hands + 468 facePrimary
MocopiFull body IMU50 fps27 bones (Sony BVH)Primary
Dual PhonesAccelerometer + Gyro100 Hz2 devices (hands)Secondary
Apple WatchMotion + Heart Rate50 HzWrist orientation + HRSecondary
Headphone SensorsHead orientation100 Hz3-axis rotationTertiary
LIDAR/DepthPoint cloud30 fpsSparse body pointsExperimental

II.2 Canonical Data Schema

All sensor data normalizes to this schema:

typescript
interface MotionFrame {
  // Identity
  frame_id: string                    // UUID
  session_id: string                  // Parent session
  phrase_id?: string                  // Linked audio phrase

  // Timing
  timestamp_ms: number                // Relative to session start
  audio_position_ms?: number          // Position in phrase audio
  beat_position?: number              // Beat number (fractional)

  // Source metadata
  source_type: SensorSource
  source_device_id: string
  source_confidence: number           // 0-1 overall confidence

  // Normalized body state (canonical representation)
  body: {
    // Root position (world space)
    root_position: Vec3               // x, y, z in meters
    root_rotation: Quaternion         // World-space orientation

    // Joint rotations (local space, relative to parent)
    joints: Record<JointName, {
      rotation: Quaternion
      confidence: number
    }>

    // Derived features (computed)
    center_of_mass: Vec3
    facing_direction: Vec3
    velocity: Vec3
    angular_velocity: Vec3
  }

  // Hand state (if available)
  hands?: {
    left?: HandState
    right?: HandState
  }

  // Face state (if available)
  face?: {
    landmarks?: FaceLandmark[]
    expression?: ExpressionWeights
    gaze_direction?: Vec3
  }

  // Raw sensor data (preserved for reprocessing)
  raw_data?: {
    format: string
    data: Uint8Array | object
  }
}

type SensorSource =
  | 'mediapipe_holistic'
  | 'mocopi_bvh'
  | 'phone_imu'
  | 'watch_motion'
  | 'headphone_imu'
  | 'lidar_depth'

type JointName =
  | 'hips' | 'spine' | 'chest' | 'neck' | 'head'
  | 'shoulder_l' | 'upper_arm_l' | 'lower_arm_l' | 'hand_l'
  | 'shoulder_r' | 'upper_arm_r' | 'lower_arm_r' | 'hand_r'
  | 'hip_l' | 'upper_leg_l' | 'lower_leg_l' | 'foot_l' | 'toes_l'
  | 'hip_r' | 'upper_leg_r' | 'lower_leg_r' | 'foot_r' | 'toes_r'

interface HandState {
  fingers: {
    thumb: FingerState
    index: FingerState
    middle: FingerState
    ring: FingerState
    pinky: FingerState
  }
  open_amount: number                 // 0 = fist, 1 = fully open
  gesture?: string                    // Detected gesture name
}

---

III. Session Architecture

III.1 Session Hierarchy

TrainingCorpus
└── Session (one continuous recording)
    ├── Metadata (performer, date, sensors, quality)
    ├── PhraseSegments[] (aligned to audio phrases)
    │   ├── phrase_id → motion_phrases table
    │   ├── frames[] → motion data
    │   └── features → computed motion features
    └── FreeformSegments[] (unaligned exploration)
        ├── frames[]
        └── detected_patterns[]

III.2 Session States

┌─────────┐    start()    ┌─────────┐   phrase_start   ┌──────────────┐
│  IDLE   │ ────────────▶ │ ACTIVE  │ ───────────────▶ │ PHRASE_SYNC  │
└─────────┘               └─────────┘                   └──────────────┘
     ▲                         │                              │
     │         stop()          │        phrase_end            │
     └─────────────────────────┴──────────────────────────────┘

III.3 Session Table Schema

sql
CREATE TABLE motion_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  -- Identity
  corpus_id UUID REFERENCES training_corpora(id),
  performer_id UUID REFERENCES performers(id),
  session_name TEXT,

  -- Timing
  started_at TIMESTAMPTZ NOT NULL,
  ended_at TIMESTAMPTZ,
  duration_seconds FLOAT,

  -- Sensor config
  sensor_sources JSONB NOT NULL,  -- Array of active sensors
  primary_source TEXT NOT NULL,   -- Which sensor is authoritative

  -- Quality metrics
  total_frames INTEGER DEFAULT 0,
  avg_fps FLOAT,
  avg_confidence FLOAT,
  tracking_gaps_count INTEGER DEFAULT 0,
  tracking_gaps_total_ms INTEGER DEFAULT 0,

  -- Training metadata
  is_validated BOOLEAN DEFAULT FALSE,
  validation_score FLOAT,
  notes TEXT,
  tags TEXT[],

  -- Status
  status session_status NOT NULL DEFAULT 'active',
  error_message TEXT,

  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TYPE session_status AS ENUM (
  'active',
  'completed',
  'error',
  'processing',
  'validated',
  'archived'
);

---

IV. Phrase Synchronization Protocol

IV.1 The Phrase-Motion Binding

When a phrase plays, all captured motion becomes bound to that phrase:

PHRASE TIMELINE
│ phrase.t_start                                    phrase.t_end │
▼                                                                ▼
┌────────────────────────────────────────────────────────────────┐
│  Audio Waveform                                                │
└────────────────────────────────────────────────────────────────┘
     ↕ synchronized ↕
┌────────────────────────────────────────────────────────────────┐
│  Motion Frames (body_energy, arm_spread, etc.)                 │
└────────────────────────────────────────────────────────────────┘
     ↕ beat-aligned ↕
┌────────────────────────────────────────────────────────────────┐
│  Beat Grid (from phrase.tempo_bpm)                             │
│  │    │    │    │    │    │    │    │    │    │    │    │     │
│  1    2    3    4    1    2    3    4    1    2    3    4     │
└────────────────────────────────────────────────────────────────┘

IV.2 Phrase Segment Table

sql
CREATE TABLE phrase_motion_segments (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  -- Links
  session_id UUID REFERENCES motion_sessions(id) ON DELETE CASCADE,
  phrase_id UUID REFERENCES motion_phrases(id),

  -- Timing (relative to phrase)
  phrase_start_beat FLOAT,          -- When recording started (beat number)
  phrase_end_beat FLOAT,            -- When recording ended
  motion_start_ms INTEGER,          -- Session-relative start
  motion_end_ms INTEGER,            -- Session-relative end

  -- Alignment quality
  audio_motion_offset_ms INTEGER,   -- Latency correction applied
  beat_alignment_score FLOAT,       -- How well motion aligns to beats

  -- Frame references
  start_frame_id UUID,
  end_frame_id UUID,
  frame_count INTEGER,

  -- Computed features (denormalized for fast access)
  features JSONB,                   -- Aggregated motion features

  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Index for fast phrase lookups
CREATE INDEX idx_phrase_motion_segments_phrase
ON phrase_motion_segments(phrase_id);

IV.3 Real-Time Sync Protocol

typescript
interface PhraseSyncState {
  phrase_id: string
  phrase: MotionPhrase

  // Audio state
  audio_started_at: number          // System timestamp
  audio_position_ms: number         // Current playback position

  // Beat tracking
  tempo_bpm: number
  beat_offset: number               // Phase offset
  current_beat: number              // Fractional beat number

  // Motion binding
  motion_start_frame_id: string
  frame_count: number
}

// Protocol messages
type SyncMessage =
  | { type: 'PHRASE_START', phrase: MotionPhrase, timestamp: number }
  | { type: 'PHRASE_BEAT', beat: number, timestamp: number }
  | { type: 'PHRASE_END', phrase_id: string, timestamp: number }
  | { type: 'MOTION_FRAME', frame: MotionFrame }

---

V. Multi-Sensor Fusion

V.1 Fusion Strategy

When multiple sensors capture simultaneously:

┌─────────────────────────────────────────────────────────────────┐
│                    SENSOR FUSION PIPELINE                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  MediaPipe ──┐                                                  │
│              │    ┌──────────────┐    ┌─────────────────┐      │
│  Mocopi ─────┼───▶│ Time Align   │───▶│ Confidence     │      │
│              │    │ (sync clocks)│    │ Weighting      │      │
│  Phones ─────┤    └──────────────┘    └────────┬────────┘      │
│              │                                  │               │
│  Watch ──────┘                                  ▼               │
│                                         ┌──────────────┐        │
│                                         │ Joint-Level  │        │
│                                         │ Fusion       │        │
│                                         └──────┬───────┘        │
│                                                │               │
│                                                ▼               │
│                                         ┌──────────────┐        │
│                                         │ Canonical    │        │
│                                         │ MotionFrame  │        │
│                                         └──────────────┘        │
└─────────────────────────────────────────────────────────────────┘

V.2 Confidence-Weighted Fusion

typescript
interface FusionConfig {
  // Per-joint source priorities
  joint_sources: Record<JointName, {
    primary: SensorSource
    fallbacks: SensorSource[]
    min_confidence: number
  }>

  // Fusion weights by source
  source_weights: Record<SensorSource, number>

  // Smoothing
  temporal_smoothing: number        // 0-1, higher = more smoothing
  confidence_threshold: number      // Below this = interpolate
}

function fuseFrame(
  frames: Map<SensorSource, MotionFrame>,
  config: FusionConfig
): MotionFrame {
  const fused: MotionFrame = createEmptyFrame()

  for (const joint of ALL_JOINTS) {
    const sources = config.joint_sources[joint]
    let bestRotation: Quaternion | null = null
    let bestConfidence = 0

    // Try sources in priority order
    for (const source of [sources.primary, ...sources.fallbacks]) {
      const frame = frames.get(source)
      if (!frame) continue

      const jointData = frame.body.joints[joint]
      if (!jointData) continue

      const weightedConf = jointData.confidence * config.source_weights[source]

      if (weightedConf > bestConfidence && weightedConf >= config.min_confidence) {
        bestRotation = jointData.rotation
        bestConfidence = weightedConf
      }
    }

    if (bestRotation) {
      fused.body.joints[joint] = {
        rotation: bestRotation,
        confidence: bestConfidence
      }
    }
  }

  return fused
}

V.3 Default Fusion Priorities

Body RegionPrimaryFallback 1Fallback 2
HeadHeadphonesMediaPipeMocopi
TorsoMocopiMediaPipe-
ArmsMocopiMediaPipePhones
HandsMediaPipePhonesMocopi
LegsMocopiMediaPipe-
FeetMocopiMediaPipe-

---

VI. Feature Extraction Pipeline

VI.1 Real-Time Features (Computed Every Frame)

typescript
interface RealtimeFeatures {
  // Energy metrics
  body_energy: number               // Overall movement intensity 0-1
  upper_body_energy: number         // Arms + torso
  lower_body_energy: number         // Legs + hips

  // Pose metrics
  arm_spread: number                // 0 = arms down, 1 = T-pose
  arm_raise: number                 // 0 = down, 1 = overhead
  crouch_level: number              // 0 = standing, 1 = crouching
  lean_angle: number                // Forward/back lean in degrees

  // Dynamics
  velocity_magnitude: number        // Overall movement speed
  acceleration_magnitude: number    // Rate of speed change
  angular_velocity: number          // Rotation speed

  // Hand features
  left_hand_open: number            // 0 = fist, 1 = open
  right_hand_open: number
  hands_together: boolean           // Are hands near each other?

  // Face features (if available)
  head_yaw: number                  // Left/right turn
  head_pitch: number                // Up/down tilt
  head_roll: number                 // Side tilt
  mouth_open: number
  smile_intensity: number

  // Beat alignment (if phrase active)
  on_beat: boolean                  // Peak motion near beat?
  beat_phase: number                // 0-1 position in beat cycle
}

VI.2 Segment-Level Features (Computed Per Phrase)

typescript
interface SegmentFeatures {
  // Statistical aggregates
  energy_mean: number
  energy_std: number
  energy_min: number
  energy_max: number

  // Pattern detection
  repetition_score: number          // How repetitive is the motion?
  complexity_score: number          // Movement variety
  symmetry_score: number            // Left/right balance

  // Rhythm analysis
  beat_hit_rate: number             // % of beats with motion peak
  off_beat_rate: number             // % motion on off-beats
  rhythmic_consistency: number      // How steady is the rhythm?

  // Pose vocabulary
  dominant_poses: string[]          // Most common pose clusters
  pose_transition_matrix: number[][]  // Markov transitions

  // Quality metrics
  tracking_coverage: number         // % frames with good tracking
  jitter_score: number              // Unwanted noise level
}

VI.3 Feature Storage

sql
-- Real-time features (one per frame)
CREATE TABLE motion_frame_features (
  frame_id UUID PRIMARY KEY REFERENCES motion_frames(id),

  -- Core features (fast access)
  body_energy FLOAT,
  arm_spread FLOAT,
  arm_raise FLOAT,
  crouch_level FLOAT,
  velocity_magnitude FLOAT,

  -- All features (JSONB for flexibility)
  features JSONB NOT NULL,

  -- Beat alignment
  beat_phase FLOAT,
  on_beat BOOLEAN
);

-- Segment features (one per phrase recording)
CREATE TABLE segment_features (
  segment_id UUID PRIMARY KEY REFERENCES phrase_motion_segments(id),

  -- Aggregates
  energy_mean FLOAT,
  energy_std FLOAT,
  beat_hit_rate FLOAT,
  complexity_score FLOAT,

  -- Full feature set
  features JSONB NOT NULL,

  -- Embedding for similarity search
  motion_embedding VECTOR(256)
);

---

VII. Training Corpus Organization

VII.1 Corpus Structure

TrainingCorpus
├── metadata.json                   # Corpus config and stats
├── performers/
│   ├── performer_001/
│   │   ├── profile.json
│   │   └── sessions/
│   │       ├── session_001/
│   │       │   ├── metadata.json
│   │       │   ├── frames.parquet
│   │       │   └── features.parquet
│   │       └── ...
│   └── ...
├── phrases/
│   ├── phrase_001/
│   │   ├── audio_features.json
│   │   ├── motion_samples/         # All recordings for this phrase
│   │   │   ├── sample_001.parquet
│   │   │   ├── sample_002.parquet
│   │   │   └── ...
│   │   └── aggregated_features.json
│   └── ...
└── exports/
    ├── training_v1/               # Versioned training sets
    │   ├── train.parquet
    │   ├── val.parquet
    │   └── test.parquet
    └── ...

VII.2 Corpus Table

sql
CREATE TABLE training_corpora (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  name TEXT NOT NULL,
  description TEXT,
  version TEXT NOT NULL,

  -- Stats
  total_sessions INTEGER DEFAULT 0,
  total_frames BIGINT DEFAULT 0,
  total_duration_hours FLOAT DEFAULT 0,
  unique_performers INTEGER DEFAULT 0,
  unique_phrases INTEGER DEFAULT 0,

  -- Quality thresholds
  min_confidence_threshold FLOAT DEFAULT 0.7,
  min_segment_duration_sec FLOAT DEFAULT 2.0,

  -- Export config
  export_format TEXT DEFAULT 'parquet',
  feature_version TEXT,

  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

---

VIII. Real-Time Parameter Signaling

VIII.1 The Dance-to-Parameter Protocol

While dancing, motion features map to controllable parameters:

typescript
interface ParameterMapping {
  // Source feature
  feature: keyof RealtimeFeatures

  // Target parameter
  parameter: string                 // e.g., "synth.filter_cutoff"

  // Mapping curve
  curve: 'linear' | 'exponential' | 'sigmoid' | 'stepped'

  // Range mapping
  input_range: [number, number]     // Feature value range
  output_range: [number, number]    // Parameter value range

  // Smoothing
  smoothing: number                 // 0-1

  // Activation
  active_condition?: string         // e.g., "body_energy > 0.3"
}

// Example mappings
const DEFAULT_MAPPINGS: ParameterMapping[] = [
  {
    feature: 'arm_raise',
    parameter: 'synth.filter_cutoff',
    curve: 'exponential',
    input_range: [0, 1],
    output_range: [200, 8000],
    smoothing: 0.8
  },
  {
    feature: 'body_energy',
    parameter: 'effects.reverb_wet',
    curve: 'linear',
    input_range: [0, 1],
    output_range: [0.1, 0.7],
    smoothing: 0.9
  },
  {
    feature: 'crouch_level',
    parameter: 'synth.pitch_bend',
    curve: 'linear',
    input_range: [0, 1],
    output_range: [0, -12],       // Semitones
    smoothing: 0.7
  }
]

VIII.2 Parameter Event Stream

typescript
interface ParameterEvent {
  timestamp: number
  parameter: string
  value: number
  source_feature: string
  raw_feature_value: number
}

// Real-time emission
function emitParameterEvents(
  features: RealtimeFeatures,
  mappings: ParameterMapping[]
): ParameterEvent[] {
  return mappings
    .filter(m => !m.active_condition || evaluateCondition(m.active_condition, features))
    .map(mapping => {
      const rawValue = features[mapping.feature]
      const mappedValue = applyMapping(rawValue, mapping)

      return {
        timestamp: Date.now(),
        parameter: mapping.parameter,
        value: mappedValue,
        source_feature: mapping.feature,
        raw_feature_value: rawValue
      }
    })
}

VIII.3 Parameter Recording

All parameter changes during a session are recorded:

sql
CREATE TABLE session_parameter_events (
  id BIGSERIAL PRIMARY KEY,
  session_id UUID REFERENCES motion_sessions(id),

  timestamp_ms INTEGER NOT NULL,
  parameter TEXT NOT NULL,
  value FLOAT NOT NULL,

  -- Source tracking
  source_feature TEXT,
  raw_feature_value FLOAT,
  mapping_id UUID
);

-- Efficient time-series queries
CREATE INDEX idx_param_events_session_time
ON session_parameter_events(session_id, timestamp_ms);

---

IX. LIMRPS Integration

IX.1 LIMRPS Feature Mapping

The LIM-RPS (Latent Intent Motion - Reactive Parameter System) uses motion as intent signals:

┌─────────────────────────────────────────────────────────────────┐
│                    LIMRPS ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Motion Input           Intent Layer           Output Layer     │
│  ────────────           ────────────           ────────────     │
│                                                                 │
│  body_energy ──────────▶ INTENSITY ──────────▶ tempo_scale     │
│                                    ──────────▶ volume           │
│                                    ──────────▶ density          │
│                                                                 │
│  arm_spread ───────────▶ OPENNESS ───────────▶ chord_spread    │
│  arm_raise                         ───────────▶ register        │
│                                    ───────────▶ reverb          │
│                                                                 │
│  crouch_level ─────────▶ TENSION ────────────▶ filter_cutoff   │
│  velocity                          ────────────▶ distortion     │
│                                    ────────────▶ attack         │
│                                                                 │
│  hand_gestures ────────▶ ARTICULATION ───────▶ note_duration   │
│  finger_state                        ─────────▶ staccato/legato│
│                                      ─────────▶ ornaments       │
│                                                                 │
│  head_direction ───────▶ ATTENTION ──────────▶ spatial_pan     │
│  gaze                              ──────────▶ focus_freq      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

IX.2 Intent-to-Generation Pipeline

typescript
interface MotionIntent {
  intensity: number       // 0-1: How much energy/urgency
  openness: number        // 0-1: Expansive vs contained
  tension: number         // 0-1: Tight/aggressive vs relaxed
  articulation: number    // 0-1: Precise vs flowing
  attention: Vec3         // Direction of focus
}

function extractIntent(features: RealtimeFeatures): MotionIntent {
  return {
    intensity: weightedAverage([
      [features.body_energy, 0.5],
      [features.velocity_magnitude, 0.3],
      [features.acceleration_magnitude, 0.2]
    ]),

    openness: weightedAverage([
      [features.arm_spread, 0.6],
      [features.arm_raise, 0.4]
    ]),

    tension: weightedAverage([
      [1 - features.crouch_level, 0.4],
      [features.angular_velocity, 0.3],
      [features.lean_angle / 45, 0.3]
    ]),

    articulation: weightedAverage([
      [features.left_hand_open, 0.25],
      [features.right_hand_open, 0.25],
      [features.beat_phase < 0.1 ? 1 : 0, 0.5]  // On-beat = articulated
    ]),

    attention: calculateAttentionVector(features)
  }
}

---

X. CC-Motion-Gen Integration

X.1 Training Data Format

For the motion generation model:

typescript
interface TrainingSample {
  // Input: Audio features
  audio: {
    mel_spectrogram: Float32Array   // [n_frames, n_mels]
    tempo_bpm: number
    beat_positions: number[]        // Frame indices of beats
    key: string
    energy_curve: number[]          // Audio energy over time
  }

  // Output: Motion sequence
  motion: {
    joint_rotations: Float32Array   // [n_frames, n_joints, 4] quaternions
    root_velocities: Float32Array   // [n_frames, 3]
    features: Float32Array          // [n_frames, n_features]
  }

  // Metadata
  metadata: {
    phrase_id: string
    performer_id: string
    session_id: string
    quality_score: number
  }
}

X.2 Export Pipeline

typescript
async function exportTrainingData(
  corpusId: string,
  outputPath: string,
  config: ExportConfig
): Promise<void> {
  const corpus = await loadCorpus(corpusId)

  const samples: TrainingSample[] = []

  for (const session of corpus.sessions) {
    for (const segment of session.phrase_segments) {
      // Skip low quality
      if (segment.quality_score < config.min_quality) continue

      // Load audio features
      const audioFeatures = await loadPhraseAudioFeatures(segment.phrase_id)

      // Load motion frames
      const frames = await loadMotionFrames(segment)

      // Resample to fixed frame rate
      const resampledMotion = resampleMotion(frames, config.target_fps)

      // Align to audio
      const alignedMotion = alignToAudio(
        resampledMotion,
        audioFeatures,
        segment.audio_motion_offset_ms
      )

      samples.push({
        audio: audioFeatures,
        motion: alignedMotion,
        metadata: {
          phrase_id: segment.phrase_id,
          performer_id: session.performer_id,
          session_id: session.id,
          quality_score: segment.quality_score
        }
      })
    }
  }

  // Split into train/val/test
  const splits = splitDataset(samples, config.split_ratios)

  // Write parquet files
  await writeParquet(`${outputPath}/train.parquet`, splits.train)
  await writeParquet(`${outputPath}/val.parquet`, splits.val)
  await writeParquet(`${outputPath}/test.parquet`, splits.test)
}

---

XI. Quality Assurance Pipeline

XI.1 Automated Quality Checks

typescript
interface QualityReport {
  session_id: string

  // Tracking quality
  frame_coverage: number            // % frames with valid tracking
  avg_confidence: number
  tracking_gaps: {
    count: number
    total_duration_ms: number
    longest_gap_ms: number
  }

  // Motion quality
  jitter_score: number              // Lower = smoother
  physics_violations: number        // Impossible poses
  frozen_frames: number             // Duplicate poses

  // Sync quality
  beat_alignment: number            // How well motion matches beats
  latency_estimate_ms: number       // Audio-motion delay

  // Overall
  usable: boolean
  usable_segments: number
  quality_score: number             // 0-100
  flags: string[]                   // Issues found
}

async function assessQuality(sessionId: string): Promise<QualityReport> {
  const session = await loadSession(sessionId)
  const frames = await loadFrames(sessionId)

  return {
    session_id: sessionId,

    frame_coverage: calculateCoverage(frames),
    avg_confidence: calculateAvgConfidence(frames),
    tracking_gaps: findTrackingGaps(frames),

    jitter_score: calculateJitter(frames),
    physics_violations: countPhysicsViolations(frames),
    frozen_frames: countFrozenFrames(frames),

    beat_alignment: calculateBeatAlignment(frames, session.phrase_segments),
    latency_estimate_ms: estimateLatency(frames, session),

    usable: /* computed */,
    usable_segments: /* computed */,
    quality_score: /* computed */,
    flags: /* collected */
  }
}

XI.2 Quality Thresholds

MetricAcceptableGoodExcellent
Frame Coverage> 80
Avg Confidence> 0.6> 0.75> 0.9
Jitter Score< 0.3< 0.15< 0.05
Beat Alignment> 0.5> 0.7> 0.85
Quality Score> 50> 70> 90

---

XII. API Endpoints

XII.1 Session Management

POST   /api/motion/sessions                    # Start new session
GET    /api/motion/sessions/:id                # Get session details
PATCH  /api/motion/sessions/:id                # Update session
DELETE /api/motion/sessions/:id                # Delete session

POST   /api/motion/sessions/:id/frames         # Upload frames (batch)
GET    /api/motion/sessions/:id/frames         # Get frames (paginated)

POST   /api/motion/sessions/:id/complete       # End and process session
GET    /api/motion/sessions/:id/quality        # Get quality report

XII.2 Phrase Sync

POST   /api/motion/sync/phrase_start           # Signal phrase started
POST   /api/motion/sync/phrase_end             # Signal phrase ended
GET    /api/motion/sync/current                # Get current sync state

WS     /api/motion/sync/stream                 # Real-time sync events

XII.3 Training Corpus

GET    /api/motion/corpus                      # List corpora
POST   /api/motion/corpus                      # Create corpus
GET    /api/motion/corpus/:id/stats            # Corpus statistics
POST   /api/motion/corpus/:id/export           # Trigger export

GET    /api/motion/corpus/:id/phrases          # Phrases with motion data
GET    /api/motion/corpus/:id/phrases/:phrase_id/samples  # Samples for phrase

---

XIII. Implementation Phases

### Phase 1: Foundation (Current)
- [x] MediaPipe capture to Supabase
- [x] Session management
- [x] Phrase synchronization
- [ ] Basic feature extraction

### Phase 2: Multi-Sensor
- [ ] Mocopi integration
- [ ] Phone IMU capture
- [ ] Sensor fusion pipeline
- [ ] Clock synchronization

### Phase 3: Quality & Processing
- [ ] Automated quality assessment
- [ ] Jitter filtering
- [ ] Gap interpolation
- [ ] Physics constraints

### Phase 4: Training Pipeline
- [ ] Corpus management
- [ ] Export pipeline
- [ ] CC-Motion-Gen integration
- [ ] Incremental training

### Phase 5: Real-Time Control
- [ ] Parameter mapping UI
- [ ] LIMRPS integration
- [ ] Live performance mode
- [ ] Latency optimization

---

XIV. Boundaries & Rules

### XIV.1 Data Retention
- Raw frames: 30 days (then archived)
- Features: Indefinite
- Sessions: Indefinite
- Exports: Version-controlled

### XIV.2 Privacy
- Performer consent required
- No facial data in exports (optional)
- Anonymization for public datasets

### XIV.3 Quality Gates
- Sessions < 50
- Jitter > 0.5: Flagged for review
- Physics violations > 10

### XIV.4 Versioning
- Feature extraction version tracked
- Training exports versioned
- Model compatibility matrix maintained

---

This protocol defines the complete data pipeline from sensor to training. Implementation proceeds phase by phase, with each phase validated before proceeding.

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

Comp-Core/.governance/architecture/DATA_PIPELINE_PROTOCOL.md

Detected Structure

Method · Evaluation · References · Architecture