Grand Diomande Research · Full HTML Reader

Music Pipeline Consolidation Summary

| Old File | New Location | Status | |----------|--------------|--------| | `parse_soundcloud_likes.py` | `sources/soundcloud.py` | Merged | | `parse_soundcloud_v2.py` | `sources/soundcloud.py` | Merged | | `download_music.py` | `download/downloader.py` | Merged | | `download_music_to_gcs.py` | `storage/gcs.py` | Merged | | `process_all_tracks.py` | `pipeline.py` | Merged | | `process_music_list.py` | `pipeline.py` | Merged | | `process_soundcloud_likes.py` | `pipeline.py` | Merged | | `reprocess_soundcloud.py` | `

Research Backlog proposal experiment writeup candidate score 24 .md

Full Public Reader

Music Pipeline Consolidation Summary

This document tracks the consolidation of music pipeline code into `backend/cc-music-pipeline/python/`.

Consolidated Files

From `tools/music-pipeline/`

Old FileNew LocationStatus
`parse_soundcloud_likes.py``sources/soundcloud.py`Merged
`parse_soundcloud_v2.py``sources/soundcloud.py`Merged
`download_music.py``download/downloader.py`Merged
`download_music_to_gcs.py``storage/gcs.py`Merged
`process_all_tracks.py``pipeline.py`Merged
`process_music_list.py``pipeline.py`Merged
`process_soundcloud_likes.py``pipeline.py`Merged
`reprocess_soundcloud.py``cli.py research`Merged
`retry_failed_tracks.py``cli.py retry`Merged
`run_pipeline.py``cli.py ingest`Merged
`debug_search.py``search/youtube.py`Merged
`test_improved_search.py`-Test only, not needed
`test_music_pipeline.py`-Test only, not needed
`test_query_cleaning.py`-Test only, not needed
`MUSIC_PIPELINE_ROADMAP.md`ArchivedReference only
`RESUME_DOWNLOADS.md`-Obsolete

Data Files Copied

All output data files have been copied to `backend/cc-music-pipeline/python/data/`:

  • `all_liked.txt` - Original SoundCloud likes (478 tracks)
  • `soundcloud_likes_youtube_results.json` - YouTube search results (286 found)
  • `download_progress.json` - Download progress tracking
  • `all_music_search_results.json` / `.csv` - Additional search results
  • `soundcloud_youtube_urls.txt` - Found YouTube URLs

---

Files Safe to Remove

The following directories can be removed after verification:

bash
# Remove redundant tools folder
rm -rf tools/music-pipeline/

# Alternative: Archive instead of delete
mv tools/music-pipeline/ archive/music-pipeline-legacy/

Before Removing

Verify the new package works:

bash
cd backend/cc-music-pipeline/python

# Test parsing
python -c "from cc_music_ingestion import SoundCloudParser; print('OK')"

# Test CLI
python cli.py status

# Test paste mode (dry run)
python cli.py paste --dry-run

---

New Package Structure

backend/cc-music-pipeline/python/
├── cc_music_ingestion/          # Main package
│   ├── __init__.py              # Package exports
│   ├── pipeline.py              # Main orchestrator
│   ├── sources/                 # Source parsers
│   │   ├── soundcloud.py        # SoundCloud likes parser
│   │   └── playlist.py          # Generic playlist parser
│   ├── search/                  # Search engines
│   │   └── youtube.py           # YouTube search with query cleaning
│   ├── download/                # Downloaders
│   │   └── downloader.py        # yt-dlp with exponential backoff
│   └── storage/                 # Storage backends
│       └── gcs.py               # Google Cloud Storage
├── cli.py                       # Command-line interface
├── data/                        # Historical data files
├── requirements.txt             # Python dependencies
├── Dockerfile                   # Container image
├── cloudbuild.yaml              # Cloud Build config
├── API.md                       # API documentation
└── CONSOLIDATION.md             # This file

---

Feature Comparison

FeatureOld (`tools/`)New (`cc_music_ingestion`)
SoundCloud parsingMultiple scripts`SoundCloudParser` class
YouTube searchIn-line code`YouTubeSearcher` with query variants
DownloadBasicExponential backoff, retry logic
GCS uploadScript-only`GCSStorage` class with database
Rate limitingManualAutomatic detection and backoff
Progress trackingJSON filesUnified progress system
CLIMultiple scriptsSingle `cli.py` with subcommands
Paste modeN/ANew feature
Cloud RunDockerfile.cloudIntegrated Dockerfile

---

Migration Complete

Date: 2025-12-27

All functionality from `tools/music-pipeline/` has been consolidated into the new unified package at `backend/cc-music-pipeline/python/cc_music_ingestion/`.

The new package includes:
- Cleaner modular architecture
- Improved error handling and retry logic
- Unified CLI with paste mode
- Comprehensive API documentation
- Cloud Run deployment support

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/backend/cc-music-pipeline/python/CONSOLIDATION.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture