Audiality: Synthesis Engine Design Notes

Audiality is a real-time audio synthesis engine designed around a small set of well-considered constraints: low CPU overhead per voice, deterministic memory behavior during synthesis, a scripting interface that separates composition logic from engine internals, and a threading model that integrates cleanly with existing audio frameworks. These notes cover the engine from a programmer's perspective -- architecture, the synthesis model, the scripting approach, voice management, and the integration patterns that make it work in a Linux audio context. The focus throughout is on the design decisions that distinguish Audiality from more general synthesis frameworks and on what those decisions mean in practice. For the broader context of where this work sits within the Maia project, see the Maia introduction. For the audio quality considerations relevant to synthesis work, see the audio quality section.

Audio synthesis engines for games and interactive applications occupy a difficult middle ground. They need to be lightweight enough to run alongside rendering workloads without consuming a significant portion of the CPU budget, but they need to be capable enough to produce audio that serves the application's actual sound design requirements. They need to accept real-time parameter changes from game logic running on a different thread without locking or glitching the audio output. And they need to do all of this at the low buffer sizes and high consistency requirements of a real-time audio environment. Audiality is designed to operate in this space, and the architecture reflects those constraints at every level.

Engine Architecture Overview

Audiality's architecture separates the audio processing thread from the control interface by design. The synthesis engine runs entirely on the audio thread, reading from a command buffer that is written by the control thread. This is the standard approach for thread-safe real-time audio, but Audiality's implementation is notable for the degree to which it enforces the separation. The audio thread does not allocate memory, does not call into system libraries that might block, and does not perform any operation that is not deterministic in duration. All non-deterministic work -- loading waveforms, compiling scripts, preparing voice parameters -- happens on the control thread before the commands are queued.

The command buffer is a lock-free ring buffer. The control thread writes commands into the buffer; the audio thread reads and executes them at the start of each processing block. Commands are small fixed-size structures, which avoids the memory allocation problem that would arise if commands could reference arbitrarily sized data. References to larger resources (waveforms, compiled script programs) are handled through handles into pre-allocated, pre-loaded resource tables. The audio thread works entirely with handles and pre-loaded data, never with raw pointers into memory that might change under it.

The Synthesis Model

Audiality's synthesis model is wavetable-based with an oscillator structure that supports frequency modulation, ring modulation, and additive combinations. The wavetable approach gives it predictable CPU cost per voice -- each voice runs a fixed oscillator graph regardless of the sound being produced, with the sound character controlled by the wavetable contents and the modulation routing. This is a deliberate departure from the more flexible but less predictable approach of synthesis engines that build voice graphs dynamically based on instrument definitions. Audiality trades that flexibility for CPU cost guarantees that make it easier to reason about the system's behavior under load.

The oscillators use band-limited wavetables to avoid aliasing at the synthesis stage. Pre-computed band-limited tables, indexed by playback pitch, are the standard solution to the aliasing problem in wavetable synthesis, but the table storage and indexing overhead is not trivial. Audiality uses a compact table format that keeps the memory footprint of a full pitch-range wavetable set within L2 cache on typical hardware, which matters because cache misses on the audio thread are one of the more reliable ways to introduce latency spikes. The design prioritizes cache behavior alongside CPU cost because on a real-time system, the occasional long-latency operation is more damaging than a consistently elevated CPU cost.

Scripting and Parameter Control

Audiality uses a compiled script language for defining instrument behavior. A script describes the voice in terms of its oscillator parameters, envelope shapes, modulation routing, and the mapping between external control parameters and internal synthesis values. Scripts are compiled ahead of time by the control thread; the audio thread executes the compiled bytecode during synthesis without any interpretation overhead beyond the bytecode dispatch itself.

The script language is intentionally limited. It has no dynamic memory allocation, no recursion, and bounded loop iteration -- all properties that guarantee the execution time of a script is bounded and predictable. This is not an accident of the implementation; it is a design requirement. A script that could loop indefinitely or allocate memory during execution would violate the real-time constraints of the audio thread. The limitations feel restrictive if you are accustomed to general-purpose scripting, but they are precisely what makes the scripting approach safe in a real-time context.

External parameter control -- changing a voice's pitch, volume, or modulation depth while it is playing -- is handled through the command buffer mechanism described above. The control thread writes a parameter update command; the audio thread applies it at the start of the next processing block. The latency of the parameter update is therefore one block, which at typical game audio buffer sizes is on the order of 5-10 milliseconds. For musical parameter changes that need tighter timing, the scripting system includes provisions for scheduling parameter changes ahead of time with sample-accurate timing, which is processed as part of the voice's script execution rather than through the command buffer.

Voice Management

Voice management in Audiality is pool-based. A fixed pool of voice slots is allocated at initialization time. Allocating a new voice takes a slot from the pool; releasing a voice returns it. Pool exhaustion -- trying to allocate a voice when all slots are in use -- is handled gracefully by stealing the oldest or quietest active voice rather than failing the allocation request. The stealing policy is configurable and can be per-pool, which is useful when some sounds (UI feedback, ambient) should be stealable and others (player character actions, story audio) should not be interrupted.

The fixed pool size means that the maximum CPU cost of voice processing is bounded at initialization time. If the pool has 64 voice slots and each voice costs a known amount of CPU at a given sample rate and buffer size, the total voice processing cost is bounded by 64 times the per-voice cost. This is the design property that makes Audiality usable in latency-constrained environments: the audio thread will never be asked to process more work than the system was sized to handle. In practice, voice activation patterns mean the pool is rarely at full utilization, but the bound is what matters for worst-case reasoning.

Integration Patterns for Linux Audio

Audiality integrates with Linux audio frameworks through a callback interface. The host application registers an audio callback that Audiality calls at each buffer boundary. In a JACK-based system, the JACK process callback calls into Audiality, which processes its voice pool and writes the output into the JACK output buffers. The Audiality audio thread in this scenario is the JACK process thread -- there is no additional thread created by Audiality itself. This is the correct approach for JACK integration: the JACK process callback must complete within the buffer period, and running additional threads under JACK's scheduler creates timing complexity that is difficult to reason about.

For applications that use ALSA directly rather than through JACK, Audiality can drive its own ALSA output thread, which it runs at real-time priority using the POSIX real-time scheduling interface. In this mode the integration is closer to a standalone audio application: Audiality manages its own audio thread lifecycle, and the host application interacts with it entirely through the control API. The ALSA integration path is simpler to set up but gives less flexibility for mixing Audiality's output with other audio sources in the same session.

Design Reflections

The constraints that shaped Audiality's design -- bounded CPU cost, deterministic memory behavior, scripted instrument definition, lock-free thread communication -- are not unique to this engine. They represent the consensus approach to real-time audio synthesis that the Linux audio community developed over years of practical experience. The interesting aspect of Audiality is the degree to which it commits to those constraints rather than treating them as guidelines. Other engines make compromises for flexibility or ease of use that Audiality does not make, and those compromises are the source of the unpredictable behavior that causes problems in demanding real-time audio contexts.

Whether the commitment is the right tradeoff depends on the use case. For a game audio engine running on embedded hardware with tight CPU and memory budgets, the constraints are assets. For a desktop music production tool where flexibility and sound quality are the primary requirements, they may be liabilities. Audiality is explicitly designed for the former context, and understanding that context is important for evaluating the design decisions documented here. The audio quality considerations relevant to synthesis-based production work are covered in the audio quality section, which addresses signal integrity independent of the synthesis engine architecture.