Grand Diomande Research · Full HTML Reader

Phrase Database Enhancement Guide

1. **Sub-segment** existing phrases into shorter, more expressive sub-phrases 2. **Analyze** your database to understand what you have 3. **Enhance** structure while staying CPU-efficient

Embodied Trajectory Systems proposal experiment writeup candidate score 24 .md

Full Public Reader

Phrase Database Enhancement Guide

Overview

You currently have 68 phrases from 7 files using fixed segmentation. This guide shows how to:

1. Sub-segment existing phrases into shorter, more expressive sub-phrases
2. Analyze your database to understand what you have
3. Enhance structure while staying CPU-efficient

Current Status

  • Method: Fixed segmentation (uniform chunks)
  • Phrases: 68
  • Source files: 7
  • Average phrase length: ~12-16 bars (estimated)

Enhancement Strategy

Option 1: Sub-Segmentation (Recommended)

Apply a second layer of segmentation on top of your fixed segments to get more granular structure:

bash
# Energy-based (fastest, CPU-efficient)
python3 scripts/sub_segment_phrases.py \
    --phrase_db data_output/phrase_db \
    --method energy \
    --min_sub_bars 2 \
    --max_sub_bars 8

# Onset-based (good for rhythmic boundaries)
python3 scripts/sub_segment_phrases.py \
    --phrase_db data_output/phrase_db \
    --method onset \
    --min_sub_bars 2 \
    --max_sub_bars 6

# Lightweight novelty (more accurate structure detection)
python3 scripts/sub_segment_phrases.py \
    --phrase_db data_output/phrase_db \
    --method lightweight \
    --min_sub_bars 2 \
    --max_sub_bars 8

What this does:
- Takes each existing 12-16 bar phrase
- Further segments it into 2-8 bar sub-phrases
- Uses lightweight, CPU-efficient methods
- Preserves all original phrases
- Creates new sub-phrases with `_sub1`, `_sub2` labels

Expected result:
- 68 phrases → ~200-300 sub-phrases
- More granular structure
- Better for training (more examples)
- More expressive (shorter phrases capture micro-structure)

Option 2: Hybrid Approach

1. Keep fixed segments for long-term structure
2. Add sub-segments for short-term expressiveness
3. Use both in training (weighted by length)

Option 3: Re-segment with Structure Method

If you want to rebuild with structure-based segmentation:

bash
# Rebuild with structure method (slower but more accurate)
python3 scripts/build_phrase_database_incremental.py \
    --rebuild \
    --input_dir "/Volumes/USB DISK/Ghetto" \
    --output_dir data_output/phrase_db

Then edit `configs/phrase_database.yaml`:

yaml
segmentation:
  method: "structure"  # Change from "fixed"
  min_bars: 4
  max_bars: 16

Methods Comparison

### Energy-Based (Fastest)
- Speed: ⚡⚡⚡ Very fast
- CPU: Low
- Accuracy: Good for energy-based boundaries
- Use case: Quick enhancement, large datasets

### Onset-Based (Balanced)
- Speed: ⚡⚡ Fast
- CPU: Low-Medium
- Accuracy: Good for rhythmic boundaries
- Use case: Rhythmic music, beat-driven tracks

### Lightweight Novelty (Most Accurate)
- Speed: ⚡ Medium
- CPU: Medium
- Accuracy: Best structure detection
- Use case: When you need accurate boundaries

Analysis

Get a full overview of your database:

bash
python3 scripts/analyze_phrase_database.py \
    --phrase_db data_output/phrase_db \
    --output analysis.json

This shows:
- Total phrases, duration, tempos, keys
- Distribution of phrase lengths
- Energy statistics
- Source file breakdown
- Recommendations

Workflow

Recommended Workflow

1. Analyze current data:

bash
   python3 scripts/analyze_phrase_database.py --phrase_db data_output/phrase_db

2. Sub-segment with energy method (fastest):

bash
   python3 scripts/sub_segment_phrases.py \
       --phrase_db data_output/phrase_db \
       --method energy \
       --min_sub_bars 2 \
       --max_sub_bars 8

3. Re-analyze to see improvements:

bash
   python3 scripts/analyze_phrase_database.py --phrase_db data_output/phrase_db

4. Regenerate embeddings (if needed):

bash
   # This will be created as a separate script
   # For now, embeddings are generated during main build

Performance Tips

CPU Efficiency

1. Use energy method for fastest processing
2. Process in batches (already done in sub_segment_phrases.py)
3. Skip very short phrases (< 2 bars)
4. Cache beat tracking (already implemented)

Memory Efficiency

  • Sub-segmentation processes one phrase at a time
  • Audio is loaded on-demand
  • Features are computed incrementally

Expected Results

After sub-segmentation:
- Before: 68 phrases, ~12-16 bars each
- After: ~200-300 phrases, 2-8 bars each
- Improvement:
- More training examples
- More granular structure
- Better expressiveness
- Still CPU-efficient

Next Steps

1. Run analysis to see current state
2. Run sub-segmentation with energy method
3. Re-analyze to verify improvements
4. Use enhanced database for training

Troubleshooting

### "No phrases in database"
- Check database path: `data_output/phrase_db/phrases.db`
- Verify database exists and has data

### Sub-segmentation too slow
- Use `--method energy` (fastest)
- Process fewer phrases: `--max_phrases 10` (for testing)

### Too many/few sub-phrases
- Adjust `--min_sub_bars` and `--max_sub_bars`
- Lower min = more sub-phrases
- Higher max = fewer sub-phrases

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/core/ml/cc-ml/diffusion/ENHANCEMENT_GUIDE.md

Detected Structure

Method · Evaluation · Code Anchors