Back to corpus
research notebacklog referencescore 18

LoRA in Stable Audio 3

LoRA fine-tuning lets you adapt a Stable Audio 3 model to a specific style, sound, or domain without retraining the whole model. The result is a small `.safetensors` file (~50–200 MB) that you load on top of any base checkpoint at inference time — stackable, adjustable in strength, and swappable without touching the base weights.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

LoRA fine-tuning lets you adapt a Stable Audio 3 model to a specific style, sound, or domain without retraining the whole model. The result is a small `.safetensors` file (~50–200 MB) that you load on top of any base checkpoint at inference time — stackable, adjustable in strength, and swappable without touching the base weights. - A dataset of audio files with matching text descriptions (at minimum ~20–50 clips; more is better) - A CUDA GPU with sufficient VRAM: | Model | Standard | With `--base_precision bf16 --adapter_type lora-xs` | |---|---|---| | `medium` | ~6.5 GB | ~5.5 GB | | `small` | ~2.5 GB | ~2 GB | - The `lora` extra installed: `uv sync --extra lora` We don't claim these are optimal settings, LoRA behavior varies a lot with dataset size, style, and hardware. But these are the configurations we've found work well for most datasets and are good starting points before tuning. Good default for most datasets. `dora-rows` is the default adapter and tends to generalize well.

Promotion decision

What has to happen next

Keep in the searchable backlog until it intersects a live paper or system.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.