Mohamed Diomande

Full HTML reader

Read the full artifact

Extracted abstract or opening context

This paper presents a retrieval-centric architecture for voice-controlled DJ performance that adapts the Speech-to-Order (S2O) streaming pipeline to the domain of professional DJ software, specifically Rekordbox. Instead of parsing transcribed text into intents via a conventional automatic speech recognition (ASR) and natural language understanding stack, the system learns a direct mapping between spoken commands and a catalog of DJ actions derived from Rekordbox’s performance preset mappings. The design combines a streaming audio front end with 320-millisecond chunking, voice activity detection, denoising, and log-Mel features; a dual-encoder embedding space for audio and text; Gemini Live as an optional streaming front end for transcripts and text embeddings; and a symbolic constraint layer that enforces deck-aware safety and performance rules before triggering Rekordbox commands. The proposed system operates in both online and offline regimes. In online mode, Gemini Live provides low-latency transcripts and embeddings that feed the retrieval pipeline. In offline mode, a local audio encoder produces command embeddings directly from microphone audio. In both cases, a shared vector index over the Rekordbox command catalog provides fast nearest-neighbor search, and a constraint solver mediates between retrieved candidates and the current DJ state to prevent destructive operations such as loading tracks onto a playing deck. The paper describes the command catalog derived from Rekordbox mapping files, the streaming and embedding infrastructure, the safety and constraint design, and a data and training strategy that evolves from text-only retrieval to a fully trained audio–text dual encoder. Evaluation considerations and practical deployment notes round out the proposal.

Promotion decision

What has to happen next

Convert into the standard paper schema, add citations, and render a draft PDF.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.