Maia: An Audio Engine Introduction

Maia is a collection of experimental work in real-time audio synthesis and audio programming for Linux. It is not a commercial product, not a finished framework, and not a replacement for any existing general-purpose audio system. It is a workspace for exploring what becomes possible when you design an audio engine around a small, explicit set of constraints rather than trying to be general-purpose from the start. These notes cover what Maia is and is not, the synthesis work that sits at its core, the design decisions that shape it, and what the code looks like from an audio programmer’s perspective. For the detailed engine architecture and scripting notes, see Audiality synthesis engine design notes. For the audio quality and configuration context that surrounds any synthesis work on Linux, see the audio quality guide.

What Maia Is

The simplest description is that Maia is a project name for the audio synthesis work documented on this site. It began as a personal research area for exploring whether a highly constrained audio engine could deliver genuinely useful results in interactive and game audio contexts without the overhead that characterises more general synthesis systems. The scope has always been narrow by design: real-time synthesis on Linux, integration with existing audio infrastructure, and a programming model that favours determinism and low overhead over flexibility.

That narrowness is a feature. Audio engines that try to do everything tend to do nothing particularly well. The resource contention problems, memory allocation strategies, threading models, and latency constraints of real-time synthesis are different enough from batch audio processing, broadcast production, and offline rendering that a single architecture struggles to serve all of them without compromising each. Maia makes its compromises upfront and explicitly: it targets real-time, it targets Linux, and it targets relatively low voice counts with high consistency rather than high voice counts with variable timing.

The name Maia is not an acronym, and it does not represent a particular release lineage. It is simply the label attached to the synthesis experimentation documented here, which has evolved over time as the underlying ideas were refined. Do not read commercial version history or product roadmap into it.

Experimental Synthesis: What That Means in Practice

The word experimental in this context means that the synthesis approaches documented here were explored, evaluated, and in some cases discarded based on what they delivered in practice, not based on prior commitment to a particular design. That is a distinction worth making because much synthesis documentation describes systems after the design decisions have been finalised, which removes the reasoning behind the decisions and makes the alternatives invisible.

The synthesis work in Maia has included wavetable lookup with interpolation strategies at several quality/cost tradeoffs, additive approaches that build timbres from explicit partial lists, simple subtractive topologies, and combinations of these that behave differently under real-time parameter automation than they do in static patch configurations. The experiments informed which approaches are tractable at the voice counts and buffer sizes relevant to interactive audio, and which approaches consume CPU in ways that are hard to bound.

One consistent finding across the synthesis experiments is that the cost of parameter smoothing and interpolation is not negligible at the voice granularity that matters for interactive audio. A synthesis model that ignores this produces audible discontinuities when game logic or user interaction changes parameters at the audio-thread boundary. Getting smooth, artefact-free parameter transitions in a real-time context requires designing the synthesis model around that requirement from the start, not adding smoothing as a post-hoc patch. This shapes the Maia design at a fundamental level and distinguishes it from synthesis engines that assume relatively static patch configurations.

A second consistent finding is that the interaction between voice allocation and CPU load is less predictable than most synthesis documentation suggests. Voice stealing policies, release tail lengths, and the granularity at which voices are retired all affect whether the engine stays within its CPU budget under adversarial input sequences. The experiments include stress tests that specifically target the worst-case allocation patterns, and the design of the voice manager reflects what those tests revealed. This is covered in more detail in the Audiality engine notes.

Audio Programming Notes: Context and Motivation

The audio programming notes associated with Maia exist because writing real-time audio code on Linux involves a set of constraints and failure modes that are not well-covered by general programming resources. The C++ standard library, many common abstractions, and patterns that work perfectly well in non-real-time code are hazardous or outright wrong in a real-time audio thread context. These are not obscure edge cases. They are the normal behaviour of standard allocators, standard synchronisation primitives, and standard logging approaches, all of which can and will introduce unbounded latency on the audio thread if used carelessly.

The notes are not a tutorial for C++ beginners or a comprehensive survey of audio programming literature. They are specific observations from the Maia development work: the failures that occurred, what caused them, how they were diagnosed, and what the fix or workaround looked like. The intent is to make those specific failure modes visible to other audio programmers who might encounter the same situations, not to build a general-purpose reference.

One recurring theme in the notes is the gap between what POSIX says about real-time scheduling and what Linux actually delivers. POSIX specifies a scheduling model; Linux delivers a specific implementation of that model with particular quirks, tuneable parameters, and configuration dependencies that affect audio workloads significantly. Understanding the difference between the specification and the implementation is essential for doing serious real-time audio work, and it is the kind of understanding that only comes from actually measuring behaviour rather than reading documentation.

Design Philosophy

The design philosophy behind Maia can be stated briefly: prefer explicit constraints over implicit flexibility, prefer measurable performance over subjective quality claims, and prefer simple correctness over complex optimisation. Each of these has practical implications.

Explicit constraints mean that the engine declares what it will and will not do, and holds to those declarations across all operating conditions. An engine that claims to support real-time operation but allocates memory during voice creation is not a real-time engine; it is a batch engine with a real-time label. Maia’s constraints are tested, not assumed.

Measurable performance means that claims about latency, CPU overhead, and voice capacity are supported by actual measurements under realistic conditions, not by theoretical analysis of the code path. Audio programmers have a responsibility to measure the systems they build and to report what they find rather than what they expect. The Maia benchmarking work is part of this commitment. For the measurement tools and approaches used, see the audio quality guide.

Simple correctness over complex optimisation means that the first implementation of any component aims to be demonstrably correct before it is optimised. Audio code is difficult to debug because the failure modes are often audible artefacts that point only vaguely at their cause. A correct but slow implementation provides a reference baseline against which an optimised version can be validated. The Maia codebase retains reference implementations alongside optimised ones where the optimisation introduces complexity that makes the code harder to verify.

Linux Audio Context

Maia exists within the broader Linux audio ecosystem, and the design reflects the infrastructure available on Linux rather than treating the platform as a generic POSIX target. ALSA provides the hardware interface; JACK and PipeWire provide the graph and scheduling model; the kernel scheduling infrastructure provides the real-time guarantees. Maia integrates with these rather than replacing them.

This integration is not always straightforward. JACK and PipeWire have different graph models, different latency reporting mechanisms, and different approaches to buffer size negotiation. A synthesis engine that wants to work cleanly with both needs to abstract those differences without losing the specific properties of each. The Maia backend layer handles this abstraction, and the notes describe the specific challenges that arose during that work.

The kernel side of real-time audio on Linux has changed significantly over time. The behaviour differences between standard scheduling, voluntary preemption, and full preemption kernels are quantitative differences that affect what buffer sizes are sustainable. These are covered in the latency-focused sections of this site. Understanding them is a prerequisite for making informed decisions about what the audio engine can promise and under what conditions those promises hold.

Scope of These Notes

These notes are not a user manual, an API reference, or an installation guide. They are technical commentary on a body of experimental work. The intended reader is someone who already understands real-time audio programming at a general level and wants to understand the specific decisions made in this work, the reasoning behind them, and what alternatives were considered and rejected.

The synthesis engine work, covered in depth on the Audiality page, represents the most substantial single component. The audio programming notes are scattered observations rather than a structured curriculum. The benchmarking work connects to the broader measurement infrastructure documented elsewhere on this site.

What ties it together is the commitment to treating real-time audio on Linux as an engineering problem with specific, measurable constraints rather than a creative endeavour with vague quality criteria. The work documented here is grounded in measurement, bounded by explicit constraints, and honest about where the experiments did not produce the results that were hoped for. That honesty is a feature of the documentation, not a limitation to apologise for.