Phrase Database Enhancement Guide
1. **Sub-segment** existing phrases into shorter, more expressive sub-phrases 2. **Analyze** your database to understand what you have 3. **Enhance** structure while staying CPU-efficient
Full Public Reader
Phrase Database Enhancement Guide
Overview
You currently have 68 phrases from 7 files using fixed segmentation. This guide shows how to:
1. Sub-segment existing phrases into shorter, more expressive sub-phrases
2. Analyze your database to understand what you have
3. Enhance structure while staying CPU-efficient
Current Status
- Method: Fixed segmentation (uniform chunks)
- Phrases: 68
- Source files: 7
- Average phrase length: ~12-16 bars (estimated)
Enhancement Strategy
Option 1: Sub-Segmentation (Recommended)
Apply a second layer of segmentation on top of your fixed segments to get more granular structure:
# Energy-based (fastest, CPU-efficient)
python3 scripts/sub_segment_phrases.py \
--phrase_db data_output/phrase_db \
--method energy \
--min_sub_bars 2 \
--max_sub_bars 8
# Onset-based (good for rhythmic boundaries)
python3 scripts/sub_segment_phrases.py \
--phrase_db data_output/phrase_db \
--method onset \
--min_sub_bars 2 \
--max_sub_bars 6
# Lightweight novelty (more accurate structure detection)
python3 scripts/sub_segment_phrases.py \
--phrase_db data_output/phrase_db \
--method lightweight \
--min_sub_bars 2 \
--max_sub_bars 8What this does:
- Takes each existing 12-16 bar phrase
- Further segments it into 2-8 bar sub-phrases
- Uses lightweight, CPU-efficient methods
- Preserves all original phrases
- Creates new sub-phrases with `_sub1`, `_sub2` labels
Expected result:
- 68 phrases → ~200-300 sub-phrases
- More granular structure
- Better for training (more examples)
- More expressive (shorter phrases capture micro-structure)
Option 2: Hybrid Approach
1. Keep fixed segments for long-term structure
2. Add sub-segments for short-term expressiveness
3. Use both in training (weighted by length)
Option 3: Re-segment with Structure Method
If you want to rebuild with structure-based segmentation:
# Rebuild with structure method (slower but more accurate)
python3 scripts/build_phrase_database_incremental.py \
--rebuild \
--input_dir "/Volumes/USB DISK/Ghetto" \
--output_dir data_output/phrase_dbThen edit `configs/phrase_database.yaml`:
segmentation:
method: "structure" # Change from "fixed"
min_bars: 4
max_bars: 16Methods Comparison
### Energy-Based (Fastest)
- Speed: ⚡⚡⚡ Very fast
- CPU: Low
- Accuracy: Good for energy-based boundaries
- Use case: Quick enhancement, large datasets
### Onset-Based (Balanced)
- Speed: ⚡⚡ Fast
- CPU: Low-Medium
- Accuracy: Good for rhythmic boundaries
- Use case: Rhythmic music, beat-driven tracks
### Lightweight Novelty (Most Accurate)
- Speed: ⚡ Medium
- CPU: Medium
- Accuracy: Best structure detection
- Use case: When you need accurate boundaries
Analysis
Get a full overview of your database:
python3 scripts/analyze_phrase_database.py \
--phrase_db data_output/phrase_db \
--output analysis.jsonThis shows:
- Total phrases, duration, tempos, keys
- Distribution of phrase lengths
- Energy statistics
- Source file breakdown
- Recommendations
Workflow
Recommended Workflow
1. Analyze current data:
python3 scripts/analyze_phrase_database.py --phrase_db data_output/phrase_db2. Sub-segment with energy method (fastest):
python3 scripts/sub_segment_phrases.py \
--phrase_db data_output/phrase_db \
--method energy \
--min_sub_bars 2 \
--max_sub_bars 83. Re-analyze to see improvements:
python3 scripts/analyze_phrase_database.py --phrase_db data_output/phrase_db4. Regenerate embeddings (if needed):
# This will be created as a separate script
# For now, embeddings are generated during main buildPerformance Tips
CPU Efficiency
1. Use energy method for fastest processing
2. Process in batches (already done in sub_segment_phrases.py)
3. Skip very short phrases (< 2 bars)
4. Cache beat tracking (already implemented)
Memory Efficiency
- Sub-segmentation processes one phrase at a time
- Audio is loaded on-demand
- Features are computed incrementally
Expected Results
After sub-segmentation:
- Before: 68 phrases, ~12-16 bars each
- After: ~200-300 phrases, 2-8 bars each
- Improvement:
- More training examples
- More granular structure
- Better expressiveness
- Still CPU-efficient
Next Steps
1. Run analysis to see current state
2. Run sub-segmentation with energy method
3. Re-analyze to verify improvements
4. Use enhanced database for training
Troubleshooting
### "No phrases in database"
- Check database path: `data_output/phrase_db/phrases.db`
- Verify database exists and has data
### Sub-segmentation too slow
- Use `--method energy` (fastest)
- Process fewer phrases: `--max_phrases 10` (for testing)
### Too many/few sub-phrases
- Adjust `--min_sub_bars` and `--max_sub_bars`
- Lower min = more sub-phrases
- Higher max = fewer sub-phrases
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/core/ml/cc-ml/diffusion/ENHANCEMENT_GUIDE.md
Detected Structure
Method · Evaluation · Code Anchors