Fixing Linux Audio

This is somewhat technical, and it's aimed at people who use Linux and are comfortable editing files from the command line. I was looking for a quick fix, but discovered that most of the documentation out there is wrong. So, I had to research what it all meant, figure out what the correct settings were, and write it all down.

For the Impatient

If you use Linux, and you're just looking for a quick way to make your sound better, add these lines to /etc/pulse/daemon.conf or $HOME/.config/pulse/daemon.conf

default-sample-format = s32ne

default-sample-rate = 192000

high-priority = yes

default-fragments = 8

resample-method = speex-float-10

I've tested this on Fedora and Ubuntu with no problems.

Also, if you're using an external DAC (if you're not sure, then you're not using one), be sure that it has sufficient power (either plugged directly into the computer or into a powered USB hub. DO NOT plug it into an unpowered hub or dongle.

After you've changed daemon.conf and checked your power situation, log out and log back in again. Ta-Daa! Better sound.

The Longer Version

What these numbers mean

Most audio that's sent, shared, and stored uses what's called PCM encoding. This looks at the sound wave and captures a single value (the amplitude of the wave) every small fraction of a second, and then reassembles all of these values into a wave. There are other ways to do it (like pulse-density modulation, which is used in SACD), but they're uncommon.

To encode or decode PCM, you need two major bits of information:

The number of bits used to record the amplitude. Amplitude is the size of the sound wave at a given point. 16 bits gives you 65 thousand possible amplitudes. 32 bits gives you approximately 4 billion possible amplitudes. CDs are encoded with 16 bits of amplitude. 16 bits is plenty to cover the hearing range of a normal human in a normal home environment IF it's used intelligently. However, it's often not used intelligently, so throwing more bits at the problem can give you better sound.
The sampling rate. That is, how many times a second you'll be capturing or playing back the amplitude. The Nyquist theorem says that you should sample at about twice the fidelity you need. CDs are sampled 44,100 times per second, which corresponds roughly with a 20kHz high note. Humans can hear a bit higher than this, but not much.

As a practical matter, you also need to agree on a few other things in advance like the order in which the bits will be stored, the size of the amplitude steps, etc. But, ultimately, it comes down to bit depth and sampling rate.

The Linux Audio Mess

Linux audio has gone through a bunch of evolution over the years. Generally, every new audio system has included a compatibility module for the system before it. Right now, PulseAudio is the most common audio system. It usually includes compatibility for ALSA and OSS, the two most common systems before it.

Most common distributions today use PulseAudio. That's all that I'm going to talk about. If you have an unusual system, this information might not apply.

CPU Load

You'll find a lot of people worrying about this, but it's not a problem for most people. It used to matter a lot, but computers are much, much faster than they used to be. On my ~8 year old laptop, with absolute maximum settings, htop shows PulseAudio using 2% of the CPU, and the music player using another 2%. At those percentages, I'm not going to worry about making things more complicated to squeeze out a few more CPU cycles.

Figuring out the maximums your device supports

Once upon a time, a sound card that provides a Digital to Analog Converter (DAC) and an amp to power speakers was a pricey extra for your computer. These days, it's common for computers to have several different sound devices. For example, my desktop computer has 4 audio devices:

The sound card that's built in to the motherboard. This actually supports surround sound, and can be split into multiple "virtual" sound devices.
Modern monitor connections (HDMI and DisplayPort) allow you to send sound over the connection. There's probably a DAC and amp in your monitor.
I have a set of USB-powered speakers with their own built-in DAC so that both power and sound can go over the USB cable.
I have a USB DAC/amp that I use with headphones.

Each of these devices supports different PCM bit depths and frequencies. To get that information, with most modern distributions, you can use a command like this:

find /proc/asound -name "codec*" | xargs grep -E "^(Codec| *(rates|bits|format))"

There's a "codec" file for each sound device that will show the PCM information. You want to get the highest number for each device that you actually use.

So, if you have two devices that you switch between (say, headphones and speakers) and those devices look like this:

Device 1 supports 24 bits at a rate of 192000
Device 2 supports 32 bits at a rate of 96000

What you want to note down is 32 bits at 192000, even though neither device supports that exact configuration. The only bit rates that are supported currently by Pulseaudio are 16, 24, and 32.

Now, you need to figure out which sample format to use. Type this command:

man 5 pulse-daemon.conf

and look for the section on default-sample-format. Here's the secret to sample formats:

For 16 bit, you want s16ne
For 24 bit, you want s24ne
For 32 bit, you want s32ne

ne stands for "native endian", which means "use whatever bit order makes the most sense on this computer". That's a good choice unless you're doing something really weird.

Starting your daemon.conf file

The system-wide PulseAudio configuration file (well, the one that we want to modify) is located at /etc/pulse/daemon.conf and your personal configuration file, if it exists, is located at $HOME/.config/pulse/daemon.conf . Make a copy of the original, if it exists, then add lines like this:

default-sample-format = s32ne

default-sample-rate = 192000

Use whatever numbers you wrote down when you looked at your sound devices. Don't bother to save and exit yet. We still have a few more changes to make.

Special note: If your maximum rate is 48000, you should also add this line:

alternate-sample-rate = 44100

The reason for this: a lot of audio is recorded at 44100. 44100 is close enough to 48000 that it doesn't resample well.

Dealing with drops and delays

Audio is one of the most time-sensitive things that happens on a computer. Drop-outs and stuttering are probably more annoying for most people than poor quality. Fortunately, PulseAudio has some tools to deal with this. Add these lines to your daemon.conf file:

high-priority = yes

default-fragments = 8

The high-priority option causes PulseAudio to run with a higher priority, making it less likely to be interrupted during heavy load. It helps, but I still had problems with stuttering.

Note: The default fragment size is 25 milliseconds, so 8 fragments will add 200ms of delay to your audio. This does mean that when watching video, the audio may be slightly out of sync. However, I found that on some systems, when heavily loaded, anything smaller would cause audio to occasionally drop or pop.

If low latency is a concern (for video calling, for example) decrease the default-fragments to 2. Audio may drop when the system is busy, but it won't be out of sync.

Finally, a note about external DACs and power. Each USB port can only supply a certain amount of power. Most external DACs want all of that power. If you put the DAC on a hub with other devices, especially power-hungry ones like external disks, you'll have problems. Be sure to plug the DAC directly into your computer, or use a powered hub.

Resampling and remixing

This is where I saw the most bad information. Lots of people pass around the same bad advice.

First, run this command to get a list of available resampling methods:

pulseaudio —dump-resample-methods

You should see a bunch of speex-float methods, numbered 0 through 10. These are the only methods that you need to care about. A few years ago, a guy with access to all of the requisite research and equipment figured out that the speex formats were as good or better than the competitors. The numbers represent quality and performance, with 0 being the lowest quality and CPU usage and 10 being the highest quality and CPU usage. He found that anything above 4 was better than humans could hear. Since then, most Linux distributions have been phasing out and deprecating almost everything else.

speex-float-5 is better than you can hear, and is a fine choice. speex-float-10 uses a bit more CPU, and is the best possible option if you're recording or mixing. Because the CPU difference is so small, I just use speex-float-10. If you're concerned about CPU or power, use speex-float-5.

So, add this line to your daemon.conf file:

resample-method = speex-float-10

A note about other configurations: It used to be necessary to "remix" extra channels, like the low-frequency effects (LFE) channel, but newer versions of PulseAudio do that for you. So, a lot of sample configurations include remix options, but they're probably not necessary if you're running a distribution from the last year or two.

Putting it all together

Your final daemon.conf file should look something like this:

default-sample-format = s32ne

default-sample-rate = 192000

high-priority = yes

default-fragments = 8

resample-method = speex-float-10

Save the file. Then, log out and log back in. (Yes, I know. You're a Linux user and your command-line-fu is strong. You think you can just restart pulseaudio. Maybe. Unfortunately, with some distributions, that's not as clean as it should be.) You probably don't need to restart the whole computer, but you probably DO need to log out and log back in again for everything to come up cleanly.

Do all of this, and, hopefully, your audio will sound a lot better than it did before.

Whoops...?

I tested this to the best of my ability, and I'm relatively certain that I've isolated the fixes to these lines in the configuration file, but I may have missed something. Please let me know if it works for you, or if it doesn't.

The Flag's a Plus

Search This Blog