Grand Diomande Research · Full HTML Reader

CC-Protocol Technical Documentation

CC-Protocol is a unified communication protocol designed for the Computational Choreography system. It provides standardized message formats for real-time sensor data streaming, latent state visualization, and control commands across distributed devices and services. The protocol enables seamless integration between iOS devices capturing motion data and backend services running machine learning models for motion analysis and synthesis.

Embodied Trajectory Systems proposal experiment writeup candidate score 38 .md

Full Public Reader

CC-Protocol Technical Documentation

Introduction

CC-Protocol is a unified communication protocol designed for the Computational Choreography system. It provides standardized message formats for real-time sensor data streaming, latent state visualization, and control commands across distributed devices and services. The protocol enables seamless integration between iOS devices capturing motion data and backend services running machine learning models for motion analysis and synthesis.

The protocol is designed with three primary goals: low-latency real-time communication suitable for interactive applications, flexible extensibility to support new sensor types and message formats, and cross-platform compatibility between Swift/iOS clients and Rust backend services.

Architecture Overview

The cc-protocol implements a layered architecture consisting of three main layers. The application layer contains domain-specific logic such as motion streaming and visualization services. The protocol layer handles message encoding, decoding, and type-safe serialization. The transport layer provides the underlying communication mechanism, currently supporting WebSocket with planned support for HTTP and UDP.

This separation of concerns allows the protocol to evolve independently of the transport mechanism. The same message structures can be transmitted over different transports depending on the requirements of latency, reliability, and bandwidth.

Core Message Structure

Every message transmitted through cc-protocol is wrapped in a NetworkMessage envelope. This envelope provides essential metadata for routing, delivery guarantees, and message ordering. The NetworkMessage structure contains a protocol version string identifying the protocol revision (currently "0.1.0"), a unique 64-bit message identifier for tracking and acknowledgment, a microsecond-precision timestamp indicating when the message was created, sender and optional target identifiers for routing, the message payload itself, a priority value from 0 to 255 where 0 represents highest priority, a boolean flag indicating whether acknowledgment is required, and an optional reply-to field linking responses to their originating messages.

In the Rust implementation, the NetworkMessage struct uses strongly-typed fields with Serde serialization attributes to ensure correct JSON encoding. The Swift implementation mirrors this structure with Codable conformance and appropriate snake_case to camelCase key mapping via custom CodingKeys.

The envelope design supports both point-to-point and broadcast communication patterns. When the target_id field is None or null, the message is treated as a broadcast to all connected clients. When specified, the backend routes the message only to the designated recipient.

Message Payload Types

The payload field of a NetworkMessage can contain one of several payload variants. The MessagePayload enum defines these variants as a tagged union, ensuring type safety at compile time while allowing flexible message content.

The Data payload variant contains sensor readings, state updates, and other domain data. This is the primary payload type for streaming motion data and receiving latent state updates. The Control payload carries commands and configuration messages for controlling device behavior or backend processing. The Sync payload handles clock synchronization messages to align timestamps across distributed devices.

Additionally, the protocol defines lightweight payload types for connection management. The Ping payload implements keep-alive functionality to detect broken connections. The Pong payload responds to Ping messages. The Ack payload acknowledges receipt of messages when requires_ack is true. The Error payload communicates protocol-level errors with a numeric code and descriptive message.

This enumeration approach allows the protocol to add new payload types without breaking existing implementations. Clients can safely ignore unknown payload types while processing recognized messages.

Sensor Data Format

Sensor data represents raw measurements from inertial measurement units (IMUs) and other motion sensors. The SensorFrame structure captures a complete snapshot of sensor readings at a specific moment in time.

Each SensorFrame includes a microsecond-precision timestamp indicating when the sensors were sampled, a device identifier string such as "left", "right", "watch", or "head" to distinguish between multiple concurrent devices, and arrays of sensor measurements. The accelerometer data contains three floating-point values representing linear acceleration in meters per second squared along the x, y, and z axes, excluding gravity. The gyroscope data provides three values for rotational velocity in radians per second around each axis. The gravity vector separates the gravitational component of acceleration as a three-element array. The quaternion represents device orientation as four floating-point values in w, x, y, z order, providing a singularity-free rotation representation.

Optional sensor data includes magnetometer readings as a three-element array when available, and heart rate in beats per minute for devices with biometric sensors. These optional fields use Option types in Rust and optional types in Swift to clearly indicate their potential absence.

The iOS implementation creates SensorFrame instances from CoreMotion's CMDeviceMotion objects. The conversion extracts the userAcceleration property for acceleration without gravity, rotationRate for gyroscope data, the gravity vector directly, and the attitude quaternion. The magnetometer is only included when the magnetic field accuracy is not uncalibrated, ensuring data quality.

Multi-Device Synchronization

Many choreographic applications involve multiple devices working in concert. The MultiDeviceFrame structure provides synchronized sensor data from multiple devices captured at approximately the same time.

Each MultiDeviceFrame contains a reference timestamp and optional SensorFrame instances for up to five device positions. The left and right fields represent hand-held devices or controllers, the body field captures data from a device mounted on the torso, the head field contains data from headphones or AR glasses, and the watch field includes smartwatch sensor data.

All fields are optional, allowing the system to operate with any combination of available devices. The backend can reconstruct the full-body pose from whatever subset of devices is currently active.

The timestamp synchronization challenge is handled through network time protocol concepts. Each device maintains a local clock and periodically synchronizes with the backend. The backend adjusts for network latency and clock drift when assembling multi-device frames, ensuring temporal alignment of sensor readings.

Data Message Types

Data messages are further categorized by their streaming semantics. The DataMessage enum distinguishes between three transmission patterns.

Stream messages carry real-time sensor data with minimal buffering. These messages prioritize low latency over reliability, making them suitable for live visualization and immediate response. Dropped frames are acceptable since newer data will arrive shortly.

Batch messages group multiple sensor frames together for efficient transmission. This mode is used when recording sessions for later playback or when network conditions make frequent small messages inefficient. Each batch includes a sequence number to detect gaps in the stream.

Response messages answer queries for historical data or computed results. These are part of request-response pairs rather than ongoing streams.

The DataPayload enum specifies the actual content within each DataMessage. The SensorFrame variant carries a single frame of sensor data, while the SensorBatch variant contains a vector of frames with an associated sequence number for ordering.

Latent State Representation

The latent state represents the output of the LIM-RPS (Learned Implicit Manifold with Riemannian Phase Synchronization) solver. This machine learning model learns a low-dimensional manifold embedding of the full-body motion dynamics, capturing the essential structure of movement patterns.

The LatentState structure contains the current position in this learned space as a vector of floating-point coordinates. The dimensionality typically ranges from 2 to 8 dimensions depending on the complexity of the captured motion vocabulary. The velocity field provides the rate of change in latent coordinates, enabling prediction and smooth interpolation.

Several derived metrics characterize the motion quality and dynamics. The coherence value ranges from 0.0 to 1.0, indicating how well the current motion matches learned patterns. High coherence suggests the movement is within the trained distribution, while low coherence indicates novel or poorly-executed movements. The periodicity metric distinguishes rhythmic, cyclical motions (values near 1.0) from chaotic or irregular movements (values near 0.0).

The energy field quantifies the total kinetic energy in the system, roughly corresponding to movement intensity. The equilibrium value measures the distance to the nearest equilibrium point in the dynamical system, with values near 1.0 indicating stable rest states and values near 0.0 indicating transitional or energetic states.

Optional geometric properties include curvature, measuring the rate of change of the velocity vector's direction, and divergence, quantifying whether trajectories in the phase space are expanding or contracting. These properties are only computed when needed for advanced visualization or analysis.

The iOS client receives LatentState messages and converts them into LatentStateUpdate structures suitable for UI rendering. This conversion extracts position and velocity into Vector3 instances for the visualization engine, maps the learned metrics to display properties, and fills in placeholder values for embodied features not yet present in the protocol.

Visualization Messages

Visualization messages aggregate multiple data types for real-time display. The VisualizationMessage structure optionally includes a LatentState for the current position in learned space, TrajectoryData showing predicted future positions, AudioUpdate containing phrase structure and beat information, and PatternUpdate describing active movement or musical patterns.

The TrajectoryData structure provides a sequence of predicted positions in latent space extending into the future. Each point is represented as a vector of coordinates, with corresponding microsecond timestamps indicating when each position is expected. This enables smooth animation and anticipatory visual effects synchronized with the motion.

The iOS implementation receives these visualization messages through a WebSocket callback, extracts the latent state if present, and dispatches updates to the main thread for rendering. The PerformanceView displays this information as an animated orb moving through space, with color and size reflecting the energy and coherence metrics.

Protocol Encoding

The protocol currently uses JSON as its primary encoding format. JSON provides human-readable messages that simplify debugging and development. The encoding uses snake_case for field names to match Rust conventions, maintains compact formatting without unnecessary whitespace in production, and preserves numeric precision for floating-point sensor values.

The ProtocolEncoder class in Swift handles bidirectional conversion between message structures and JSON data. The encoder uses JSONEncoder with convertToSnakeCase key encoding strategy to match the Rust backend's naming conventions. The decoder applies the inverse transformation with convertFromSnakeCase.

For transmission efficiency, the protocol is designed to support MessagePack encoding as an alternative to JSON. MessagePack is a binary format that reduces message size by approximately 30-50

WebSocket Transport

WebSocket provides the default transport mechanism for cc-protocol, offering full-duplex communication over a single TCP connection. The connection lifecycle begins with the client initiating a connection to the backend WebSocket endpoint at a path like `/visualization`. The server responds with an HTTP 101 Switching Protocols response, upgrading the connection from HTTP to WebSocket.

Once established, both sides can send messages asynchronously without request-response pairing. The protocol maintains connection health through periodic ping-pong exchanges. Every 30 seconds, the client sends a Ping message. The server responds with a Pong message. If no Pong arrives within a timeout period, the connection is considered dead and must be re-established.

The iOS WebSocketService class manages the entire connection lifecycle. It uses URLSessionWebSocketTask for WebSocket communication, maintains connection state as an observable property for UI updates, implements exponential backoff for automatic reconnection after failures, provides callback closures for received visualization and latent state messages, and exposes methods for sending sensor frames and control messages.

Connection state transitions through four primary states. The disconnected state indicates no active connection, connecting shows an in-progress connection attempt, connected represents a healthy active connection, and error contains a descriptive message when connection fails. These states are published through SwiftUI's observation system, allowing the UI to reactively display connection status.

Message Flow Patterns

The typical message flow for sensor streaming follows a straightforward pattern. The iOS device captures motion data from CoreMotion at 50-100 Hz. Each sample is packaged into a SensorFrame structure with timestamp and device identifier. The frame is encoded into a ProtocolSensorFrame matching the protocol schema. This protocol frame is wrapped in a DataMessage with Stream message type. The DataMessage is wrapped in a NetworkMessage envelope with routing metadata. The encoder serializes the NetworkMessage to JSON. The WebSocket task sends the JSON string to the backend.

The backend receives this data, deserializes the JSON into Rust structures, extracts the SensorFrame from the nested envelope, and feeds the sensor data into the LIM-RPS solver for processing. The solver produces latent state updates at a slightly lower rate, typically 30-60 Hz to match the desired visualization frame rate.

For visualization updates, the flow reverses direction. The LIM-RPS solver outputs latent coordinates and metrics. The backend constructs a LatentState structure with timestamp. This state is optionally combined with trajectory predictions in a VisualizationMessage. The visualization message is wrapped in a NetworkMessage and encoded to JSON. The WebSocket transmits the message to all connected clients.

The iOS client receives the JSON string, decodes it into a ProtocolVisualizationMessage, extracts the ProtocolLatentState if present, converts it to a LatentStateUpdate for the UI layer, and dispatches to the main thread to update the rendering. The PerformanceView observes these updates and animates the visualization accordingly.

Performance Characteristics

The protocol is designed to meet strict latency requirements for interactive applications. The target end-to-end latency budget allocates less than 10 milliseconds for sensor capture at the device, under 2 milliseconds for JSON encoding, approximately 5 milliseconds for network transmission on a local WiFi network, roughly 20 milliseconds for backend processing including LIM-RPS inference, another 2 milliseconds for response encoding, and less than 16 milliseconds for UI updates to maintain 60 frames per second. This totals to a target of under 55 milliseconds from sensor measurement to visual feedback.

Actual performance depends heavily on network conditions and device capabilities. On a stable local network with modern devices, the system typically achieves 30-40 millisecond end-to-end latency. This is perceptually instantaneous for most interactive scenarios.

Throughput requirements are moderate compared to video streaming. Each SensorFrame message consumes approximately 300-500 bytes in JSON format, depending on the presence of optional fields. At 50 Hz streaming rate, this represents about 15-25 kilobytes per second per device. LatentState messages are slightly smaller at 200-300 bytes, sent at 30-60 Hz for another 6-18 kilobytes per second. A typical session with bidirectional communication consumes 40-100 kilobytes per second, well within the capacity of modern WiFi networks.

Error Handling Strategy

The protocol distinguishes between recoverable and fatal errors. Recoverable errors include transient network failures, which trigger automatic reconnection with exponential backoff. Malformed messages that fail to decode are logged and skipped, with processing continuing on subsequent messages. Timeouts waiting for responses result in retry attempts with backoff.

Fatal errors require user intervention or system restart. These include protocol version mismatches between client and server, authentication failures when security is enabled, and resource exhaustion on the backend preventing new connections.

The WebSocketService implements a robust reconnection strategy. When a connection drops, it waits for an initial backoff period of 1 second. If reconnection fails, it doubles the backoff period up to a maximum of 32 seconds. Once reconnected, it resets the backoff period to the initial value. This approach prevents connection storms while ensuring prompt recovery from transient failures.

Decode errors are handled gracefully by logging the problematic message for debugging and incrementing an error counter published to the UI. The connection remains active and continues processing subsequent messages. This resilience is critical for development when protocol changes may temporarily cause incompatibilities.

Security Considerations

The current development implementation operates on local networks without encryption or authentication. This simplified security model is acceptable for prototyping and testing but unsuitable for production deployment or use on untrusted networks.

Production deployments require several security enhancements. Transport layer security must be enabled by upgrading from ws:// to wss:// URLs, ensuring all traffic is encrypted using TLS. Device authentication should verify that only authorized devices can connect to the backend, potentially using JWT tokens or shared secrets provisioned during device setup. Message integrity can be ensured through digital signatures or message authentication codes, preventing tampering with sensor data or latent state messages.

Rate limiting protects the backend from denial-of-service attacks, both accidental and malicious. Each device should be limited to a reasonable message rate, with excess messages dropped or causing temporary connection suspension.

The protocol design supports these security features without requiring changes to the core message structure. Authentication tokens can be transmitted in the initial WebSocket handshake headers. Message signatures can be added as additional fields in the NetworkMessage envelope without breaking existing decoders.

Extension Mechanisms

The protocol is designed for evolution without breaking compatibility. New payload types can be added to the MessagePayload enum. Clients that do not recognize a new type can safely ignore those messages while processing known types. This forward compatibility is ensured by the tagged union structure.

New sensor types extend the SensorFrame structure with additional optional fields. For example, GPS data could be added as an optional GPS field containing latitude, longitude, altitude, and accuracy. EMG muscle sensors could provide an optional emg field with electrode readings. EEG brain sensors could include an optional eeg field with channel data.

Existing clients ignore unknown fields during deserialization, while new clients can access the additional data. This allows gradual rollout of new sensor support without coordinating simultaneous updates to all devices and services.

Custom visualization data can be added to the VisualizationMessage structure following the same pattern. New renderers might require additional fields beyond latent state and trajectory. These can be added as optional fields in the message, with legacy visualization code ignoring what it does not understand.

Debugging and Development

Effective debugging requires visibility into message flow and content. The protocol includes several debugging facilities to support development.

The WebSocketService includes a debug mode flag that enables verbose logging of all sent and received messages. When enabled, it logs the full JSON content of each message, connection state transitions with timestamps, encoding and decoding errors with stack traces, and network errors with diagnostic information.

The ProtocolEncoder provides methods to convert messages to human-readable JSON strings without transmitting them. This allows inspection of message structure before sending, useful for validating message construction logic.

The backend can be configured with Rust's standard logging framework to emit detailed protocol logs. Setting the RUST_LOG environment variable to cc_protocol=debug enables message-level logging on the server side.

For production monitoring, key metrics are tracked and exposed. These include message rate and latency histograms, connection success and failure rates, decode error frequency by message type, and payload size distributions. These metrics enable performance monitoring and capacity planning.

Implementation Examples

The iOS client integrates cc-protocol through several coordinated services. The MotionStreamer captures sensor data and creates SensorFrame instances from CoreMotion. The WebSocketService manages the connection and handles encoding/transmission. The VisualizationService receives latent state updates and provides them to the UI layer.

To send sensor data, the MotionStreamer calls WebSocketService.sendSensorFrame with a constructed SensorFrame. The WebSocketService converts this to a ProtocolSensorFrame, wraps it in a DataMessage with Stream type, wraps that in a NetworkMessage with generated message ID and timestamp, encodes the entire structure to JSON using ProtocolEncoder, and transmits the JSON string via the WebSocket connection.

To receive latent state updates, the application configures a callback on the WebSocketService. When a message arrives, the service decodes the JSON into a ProtocolNetworkMessage, extracts the payload and verifies it is a visualization message, extracts the ProtocolLatentState if present, invokes the registered callback with the latent state, and the VisualizationService converts the protocol state to a LatentStateUpdate for rendering.

This clean separation of concerns allows each component to focus on its domain. The protocol layer handles serialization and type safety. The service layer manages connections and callbacks. The application layer performs domain-specific processing and visualization.

Future Enhancements

Several planned enhancements will improve protocol efficiency and capability. MessagePack binary encoding will reduce message sizes and parsing overhead on mobile devices. The backend will auto-detect encoding format and respond in the same format, allowing negotiation on a per-connection basis.

Compression using algorithms like gzip or brotli can further reduce bandwidth for batch transmissions, though real-time streaming typically does not benefit due to the latency cost of compression.

Batching support will allow clients to bundle multiple sensor frames into a single message when operating in recording mode rather than live streaming. This reduces protocol overhead when latency is not critical.

Delta encoding can transmit only the differences between consecutive sensor frames, exploiting temporal coherence to reduce data volume. This is particularly effective for slowly-changing sensors like magnetometers.

Time synchronization will implement a network time protocol-style clock synchronization algorithm, allowing precise temporal alignment of multi-device frames even in the presence of clock drift.

Multi-transport support will enable UDP for ultra-low-latency streaming where some packet loss is acceptable, HTTP for reliable request-response patterns, and continued WebSocket support for bidirectional streaming.

Offline queuing will buffer sensor data when the network is unavailable, transmitting the backlog when connection is restored. This ensures no data loss during temporary connectivity issues, important for recording complete sessions.

Conclusion

CC-Protocol provides a robust, extensible foundation for real-time motion data streaming and visualization in the Computational Choreography system. Its layered architecture separates transport concerns from message semantics, enabling evolution and adaptation to new requirements. The type-safe message structure prevents common serialization errors while remaining flexible enough to accommodate new sensor types and visualization data.

The current implementation using JSON and WebSocket balances development velocity with performance, providing human-readable messages for debugging while achieving sub-100ms end-to-end latency on local networks. Future optimizations with binary encoding and compression will support higher device counts and remote network scenarios.

The protocol's design philosophy prioritizes correctness, extensibility, and developer experience. Strong typing catches errors at compile time. Comprehensive error handling maintains robustness in production. Clear separation of concerns makes the codebase maintainable and testable. These qualities establish cc-protocol as a solid foundation for the distributed real-time system powering Computational Choreography.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

projects/Documentation/04-reference/PROTOCOL.md

Detected Structure

Introduction · Method · Evaluation · Figures · Architecture