Stable Audio 3 Inference Methods

Full HTML reader

Read the full artifact

Extracted abstract or opening context

# Stable Audio 3 Inference Methods An overview of the different inference modes. The python interface is shown, but these controls are the same as for the gradio interface > New to diffusion/Flow Matching models? See [Model Overview](../guides/model-overview.md) > for a conceptual overview before diving in. | Model | Type | |---|---| | `medium` | Post-trained | | `small-music` | Post-trained | | `small-sfx` | Post-trained | | `medium-base` | Base | | `small-music-base` | Base | | `small-sfx-base` | Base | > **Note:** `medium` and `medium-base` require a CUDA GPU with Flash Attention support due to using SAME-L as their autoencoder. - **`prompt`** — Text description of the audio to generate. For help crafting good prompts, see [Prompt Guide](../guides/prompting.md) - **`duration`** — Duration of the generated audio in seconds (default: `120`). - **`steps`** — Number of sampling steps (default: `8`). For even faster inference, reduce this number at some cost to quality. However, going higher than 8 doesn't necessarily increase quality (unless using a '-base' model, where you should use something like 50) - **`seed`** - Random seed for reproducible outputs if needed. Use -1 to select a random seed (default) or select your favorite number for deterministic results. - **`batch_size`** - Generate multiple at once, useful is you have a GPU and want to get a lot of variations. The max is limited by your GPU's VRAM.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.