Conversation Data Analysis Plan

Full HTML reader

Read the full artifact

Extracted abstract or opening context

**File**: `data/conversations_new.json` **Size**: 63.8 MB **Format**: JSON array of conversation objects - **Total Conversations**: 282 - **Total Messages**: 7,469 - User Messages: 3,664 - Assistant Messages: 3,805 - **Time Range**: February 17, 2025 → December 8, 2025 (294 days) - **Average Messages per Conversation**: 26.5 - **Data Quality**: 281 non-empty conversations (99.6%) | Model | Conversations | |-------|--------------| | gpt-4o | 105 (37%) | | gpt-5 | 101 (36%) | | gpt-5-1 | 39 (14%) | | gpt-4-5 | 14 (5%) | | auto | 5 (2%) | | Length | Count | Percentage | |--------|-------|------------| | 1-5 messages | 119 | 42.2% | | 6-10 messages | 53 | 18.8% | | 11-20 messages | 38 | 13.5% | | 21-50 messages | 33 | 11.7% | | 50+ messages | 38 | 13.5% | 1. **Tree Structure**: Conversations are stored as trees (mapping with parent/children) 2. **Node Types**: Includes system, user, and assistant messages 3. **Content**: Text stored in `content.parts` array 4. **Metadata Rich**: Extensive metadata (timestamps, moderation, memory scope, etc.)

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.