Back to corpus
research noteexperiment writeup candidatescore 24

sa3_mlx — Stable Audio 3 in pure MLX

One line on a fresh Apple Silicon Mac — installs everything and plays back ~2 minutes of "Impending tribal, epic orchestral buildup":

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

Apple-Silicon-native inference for **Stable Audio 3**, with no PyTorch, transformers, or stable-audio-tools at runtime. One line on a fresh Apple Silicon Mac — installs everything and plays back ~2 minutes of "Impending tribal, epic orchestral buildup": | `--dit` | model | best for | |------------|--------------------|--------------------------------| | `sm-music` | sa3-sm-music (50 M block) | fast music generation | | `sm-sfx` | sa3-sm-sfx (50 M block) | sound effects | | `medium` | sa3-medium-ARC (1.4 B) | higher-quality music, slower | | mode | flags | example | |------------------|-----------------------------------------------|----------------------------------| | text-to-audio | `--prompt P` | new clip from a description | | audio-to-audio | `--prompt P --init-audio IN.wav --init-noise-level σ` | variation of an existing clip | | inpainting | `--prompt P --init-audio IN.wav --inpaint-range "S,E"` | regenerate one section, keep rest | | CFG + negative | `--cfg 3.0 --negative-prompt P_NEG` | steer toward / away from prompts | 1. Install [uv](https://github.com/astral-sh/uv) via the official curl installer if it's missing (prompts y/N; `-y` skips the prompt). 2. Create a project-local `.venv/` with managed Python 3.11. 3. `uv pip install` the runtime deps into the venv (much faster than pip). 4. Ask which DiT bundles to download from HuggingFace (`stabilityai/stable-audio-3-optimized`). Each pick pulls its matching audio codec; T5Gemma (the shared text encoder) is downloaded once. Already-present weights are skipped.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.