Low-Latency Linux Audio: A Practical Reference

Low latency in Linux audio is not a single setting or a magic kernel flag. It is a system-wide discipline that starts at the interrupt controller, passes through the scheduler, crosses the sound server, and ends at the buffer hand-off to your audio interface. In this reference I cover what low latency actually means when you are tracking or performing on Linux, how PREEMPT_RT and scheduler tuning change real behavior, where IRQ handling and buffer sizing decisions interact, why chasing the absolute smallest buffer is usually counterproductive, and how to read the benchmark traces that tell you what is really happening inside your system. These notes come from years of running latency sweeps on everything from repurposed office desktops to purpose-built audio workstations. For the broader context on Linux audio development and community resources, see the Linux Audio Developers and Users hub.

What Low Latency Actually Means in Audio Work

Latency, in the simplest terms, is the delay between an event and the system's response to it. In audio work, the event is usually a sample arriving from an input or a note trigger from a MIDI controller. The response is the processed audio reaching the output. The total round-trip delay is what musicians feel. Below roughly 10 milliseconds, most players perceive monitoring as effectively instantaneous. Between 10 and 20 ms, it starts to feel like playing in a slightly distant room. Above 25 ms, it becomes distracting for most people, and above 40 ms it is unusable for real-time performance.

But the number on a latency calculator is not the whole picture. What matters equally is consistency. A system running at a steady 8 ms round trip is far more usable than one that averages 4 ms but occasionally spikes to 30 ms. Those spikes manifest as XRUNs, audible glitches, clicks, or momentary silence. The human ear is remarkably sensitive to timing irregularities, even when it cannot articulate what went wrong. A stable, slightly higher latency is always preferable to an unstable low one.

This distinction between average latency and worst-case latency is the central tension of the entire low-latency Linux audio discussion. The kernel is a general-purpose system. It runs disk I/O, network interrupts, GPU compositing, and thousands of other tasks alongside your audio processing. Making the audio path reliably fast means controlling everything else that could delay it.

PREEMPT_RT and What It Changes

The standard Linux kernel uses a preemptive scheduling model, but not everything is preemptible. Certain critical sections, interrupt handlers, and kernel paths run with preemption disabled, meaning no other task can interrupt them regardless of priority. For most workloads this is fine. For audio, it means your carefully configured real-time audio thread can be blocked for unpredictable durations by something as mundane as a filesystem journal commit or a USB enumeration event.

The PREEMPT_RT patch set addresses this by converting most interrupt handlers to threaded form and making nearly all kernel code paths preemptible. The practical effect is that a high-priority audio thread can preempt almost anything, including work that on a standard kernel would hold a spinlock and block everything else. With PREEMPT_RT, the worst-case scheduling latency drops from potentially tens of milliseconds to low hundreds of microseconds on well-configured hardware.

Starting with the 6.x kernel series, significant portions of the PREEMPT_RT work have been merged into mainline. This is a meaningful shift. For years, running a real-time kernel meant maintaining a separate patched build, which created friction for distribution packagers and end users alike. Now, many distributions ship kernels with enough real-time capability built in that dedicated audio workstations no longer need a custom compile in most cases.

That said, PREEMPT_RT is not a guarantee of low latency. It removes one category of obstacle. If your system has a badly behaved driver that holds locks for extended periods, or a firmware-level SMI (System Management Interrupt) that steals CPU cycles invisibly, PREEMPT_RT cannot fix that. It gives the scheduler the ability to preempt. It does not fix hardware that does not cooperate.

Scheduler Behavior and Real-Time Priorities

The Linux scheduler determines which thread runs when. For audio work, two scheduling policies matter:SCHED_FIFO and SCHED_RR. Both are real-time policies, meaning threads using them always run before any normal (SCHED_OTHER) thread, regardless of how long they have been waiting. SCHED_FIFO runs a thread until it voluntarily yields or blocks. SCHED_RR adds time-slicing among threads at the same priority level.

JACK and PipeWire both request real-time scheduling for their audio processing threads. Whether the system grants it depends on the rtprio limits configured in /etc/security/limits.conf or the equivalent systemd configuration, and on whether the user belongs to the appropriate group (typically audio or pipewire). A common source of latency frustration is a system where the sound server thinks it has real-time priority but the kernel is actually running it as a normal thread because the PAM limits were not applied at login.

You can verify real-time scheduling is active with chrt -p on the audio thread's PID, or by checking pw-top for PipeWire or jack_lsp -l for JACK. If the scheduling policy shows SCHED_OTHER instead of SCHED_FIFO, your real-time configuration is not working, and no amount of buffer tuning will compensate.

Priority values also matter more than people realize. On a system running both JACK and a PipeWire session for desktop audio, the relative priorities determine which one wins when they compete for CPU time. Misconfigured priorities do not cause obvious errors. They cause subtle, intermittent glitches that are difficult to reproduce and maddening to debug.

IRQ Handling and CPU Affinity

Hardware interrupts (IRQs) are how your audio interface tells the CPU that data is ready or that a buffer needs filling. On a default Linux installation, IRQs from all devices are typically handled by whichever CPU core happens to be available, managed by irqbalance. For audio, this creates a problem: your audio interface interrupt might land on a core that is busy handling a network interrupt or a storage controller event, adding jitter to the audio processing path.

The classic tuning approach is to pin the audio interface IRQ to a dedicated CPU core using /proc/irq/<N>/smp_affinity, and then set the audio server's processing threads to prefer the same core or a nearby one in the cache hierarchy. On systems with hyperthreading, this gets more nuanced: you generally want to avoid scheduling audio work on both logical cores of the same physical core, because they share execution resources and can interfere with each other's timing.

In practice, IRQ affinity tuning produces its biggest gains on systems with many active devices. A minimal audio workstation with one USB interface, no WiFi, and a wired Ethernet connection often runs fine without manual affinity settings. A laptop with Bluetooth, WiFi, a touchscreen, and a USB hub full of devices benefits substantially from pinning the audio IRQ away from the busy cores.

With PREEMPT_RT kernels, interrupt handlers run as threads themselves, which means they have schedulable priorities. You can raise the priority of your audio interface's interrupt thread above everything except the audio processing thread itself, creating a tight fast path from hardware to application.

Buffer Tuning: The Real Trade-Off

Buffer size is the single most discussed latency parameter, and the one most often set incorrectly. The buffer is the chunk of audio samples the system accumulates before passing it to the next stage. Smaller buffers mean less delay. They also mean the CPU has to wake up and process audio more frequently, leaving less margin for anything else to cause a hiccup. The audio quality guide covers buffer mechanics in detail from the quality perspective. Here I focus on the latency implications.

At 48 kHz sample rate, a buffer of 256 samples represents about 5.3 ms of audio. Double that for round trip (input buffer plus output buffer), and you get roughly 10.6 ms before adding any driver or converter overhead. A buffer of 64 samples gives you about 1.3 ms per direction, roughly 2.7 ms round trip on paper, which feels instantaneous but requires a system that can process every callback within that tight window with zero exceptions.

The question is not "can your system handle 64 samples?" on a good day. The question is whether it can handle 64 samples at 3 AM when a background cron job fires, or when the journal daemon flushes, or when a USB device re-enumerates. If it cannot, you get an XRUN. One XRUN during a take is one too many.

My working recommendation after years of testing: start at 256 samples, run your full plugin chain, stress the system with typical background activity, and monitor XRUNs for an extended period. If you see zero XRUNs after an hour of real work, try 128. If that holds, try 64. If 128 is solid but 64 is not, stay at 128. The 2 to 3 ms of additional latency between 64 and 128 is inaudible to most performers, and the stability difference can be enormous.

Interface Considerations

Your audio interface is the physical boundary where latency promises meet hardware reality. USB, Firewire, PCIe, and Thunderbolt all have different bus architectures, and those architectures affect latency in ways that buffer size alone does not capture.

USB audio interfaces, which dominate the consumer and prosumer market, communicate using isochronous transfers with a fixed microframe interval. USB 2.0 High Speed uses 125 microsecond microframes. USB Audio Class 2 devices can achieve very low latency on Linux through the kernel's USB audio driver, but the actual performance varies significantly between chipsets. Some devices add an extra millisecond of internal buffering that no amount of Linux tuning can remove. Others report their buffer capabilities accurately and perform as expected.

PCIe interfaces bypass the USB stack entirely and tend to have the most predictable timing. RME cards, for instance, have long been the reference standard for low-latency Linux audio because their ALSA drivers are well-maintained and their hardware buffer reporting is accurate. If you are building a dedicated recording or performance rig and latency is critical, a PCIe interface simplifies your entire tuning process.

Thunderbolt interfaces on Linux have improved substantially since 2022, but driver support still varies. Some work through ALSA directly, others require firmware-specific handling. Check the ALSA project compatibility notes before buying.

When Lower Latency Stops Helping

There is a common assumption that lower is always better. It is not. Below a certain threshold, reducing latency provides no perceptible benefit while dramatically reducing stability margin.

For live monitoring during tracking, anything below 6 ms round trip is indistinguishable from zero for most musicians. Drummers and percussionists, who have the tightest timing expectations, can sometimes perceive differences down to about 4 ms. Below that, you are optimizing for a specification sheet rather than a musical outcome.

For mixing work, latency is largely irrelevant because you are not monitoring through the processing chain in real time. A mixer working at 512 or even 1024 samples has more plugin headroom, fewer XRUN risks, and no perceptible penalty. Running a mixing session at 64 samples because "low latency is better" wastes CPU overhead on scheduling discipline that delivers zero benefit for the task.

For software synthesis and virtual instruments played live, the sweet spot is usually 128 to 256 samples. The instruments themselves often introduce internal latency through their processing chains, and reducing the system buffer below that internal latency changes nothing about what the performer feels. Check the instrument's reported latency, not just the system buffer, before optimizing further.

Reading Benchmark Traces

Numbers without context are meaningless. Saying "I got 3 ms latency" tells you nothing about whether the system is stable, what the worst case looked like, or how the measurement was taken. Useful latency characterization requires traces that show distribution over time.

The HDRBench tool and the Latency Graph utility both produce output that shows scheduling latency over thousands of cycles. What you want to look at is not the average but the tail. A histogram with 99% of values at 100 microseconds and 1% at 5000 microseconds tells you the system is mostly fine but has something causing occasional massive delays. That tail is your XRUN risk.

When reading traces, look for periodic spikes. If they occur at regular intervals, they usually indicate a hardware event: an SMI, a USB polling cycle, a thermal throttling event, or a periodic kernel task. Irregular spikes suggest software contention: a process waking up unpredictably, a garbage collector running, or a filesystem operation blocking a core. The shape of the spike pattern tells you where to look for the cause.

Compare traces across kernel configurations. Run the same test on a standard kernel, then on a PREEMPT_RT kernel with the same hardware and background load. The average may barely change. The tail should shrink dramatically. That tail improvement is what PREEMPT_RT actually buys you, and it is visible in the traces even when the average numbers look similar.

A Practical Tuning Order

If you are setting up a system for low-latency audio work, this sequence saves the most time:

Verify real-time scheduling is working. Check chrt -p on your audio server process. If it shows SCHED_OTHER, fix your PAM limits or systemd configuration before doing anything else. Everything downstream depends on this.
Disable unnecessary hardware. Turn off Bluetooth, WiFi (if wired is available), webcams, and any USB devices you do not need for the session. Each active device is a potential interrupt source.
Set the CPU governor to performance. The powersave governor introduces variable frequency scaling that adds latency jitter. Lock the cores at full speed during audio work.
Start at a conservative buffer size. 256 samples at 48 kHz. Run your full session for at least 30 minutes. Monitor XRUNs.
Reduce buffer size incrementally. Only drop to a smaller buffer after confirming stability at the current one.
Run a benchmark trace. Use HDRBench or cyclictest to characterize your worst-case scheduling latency under load. If the tail is clean, your system is well tuned. If the tail shows spikes, identify the cause before reducing buffer size further.

This order avoids the common mistake of chasing buffer size reductions before the system fundamentals are sound. A properly configured system at 256 samples will outperform a misconfigured one at 64 samples every time, because the misconfigured one will XRUN under any real load.

Connecting to Other Resources

This page is the latency-focused reference in the LAD resource collection. Related material that builds on these concepts:

The Linux Audio Quality guide covers buffer mechanics, XRUN diagnosis, and sample rate decisions from the quality and configuration perspective rather than the scheduling perspective.
The HDRBench page documents the benchmark tool and provides the downloadable archive for running your own scheduling latency measurements.
The Latency Graph utility helps visualize and compare traces across kernel configurations and hardware setups.
The LAD hub provides the full navigational map of community resources, FAQ, subscription guides, and event history.

Low-latency audio on Linux is a solved problem in the sense that the tools, kernel features, and configuration knowledge all exist. It is an unsolved problem in the sense that every system is different, every workload has its own demands, and the only way to know your actual performance is to measure it. The reference material above gives you the framework. The traces and benchmarks give you the evidence. Trust the measurements, not the assumptions.