Thursday, May 21, 2009

The Sound of Fedora 11

An Interview on Fedora 11's enhanced Audio Control with Lennart Poettering

Where would we be without sound? It's the most primitive of communication methods, and yet it has spawned so much technology around it. Whether you're a musician, a DJ, riding a bus to work, or even just stuck in a cubicle listening to the radio somewhere, sound has become an integral part of your daily experiences. When Fedora 11 lands, along with it will land a number of enhancements to the sound subsystem, including unified volume control, per stream and per device monitoring, and proper Bluetooth audio support. I recently caught up with Lennart Poettering, Red Hat Desktop Team Engineer and resident audio guru. Here's what he had to say about the upcoming improvements and what the future holds:

1. Please introduce yourself and give us a brief intro to how you started working on the upcoming audio improvement in F11.

I am Lennart Poettering and have been working for Red Hat in the Desktop Group for two years now this month. I live in Berlin, Germany.

PA has been part of Fedora since F8. Since then we used to ship two volume control appications: the GNOME volume control and a PA (Pulse Audio) specific tool (pavucontrol). The latter was mostly a showcase what can be done with PA and I wrote it mostly as a demo, not because I thought it was any good as an UI.

Of course having these two volume control UIs in Fedora was a situation that badly needed fixing. Especially since both UIs exposed too many unnecessary options: the GNOME volume control exposed a lot of low-level hardware-specific features that only a tiny minority of people actually really understood, and the PA volume control exposed a lot of low-level software features that a slightly larger minority of people only actually really understood.

Now during the last year we reached a point were the feature set of PA for volume controls became very complete (with such things as arbitrary meta data on every stream/device, per-stream and per-device monitoring, hardware volume range extension, "flat" volumes and lots of other stuff) and Jon McCann with help from Bastien Nocera finally took up the work to
fix the UI situation.

They basically designed the new UI from scratch with input from usability experts. It implements many of the features the old pavucontrol tool did, but in a much nicer, streamlined way. Also it integrates sound theme/event sound control with general audio configuraton and volume control in a single UI tool.

2. Can you give us some background on the upcoming changes to the audio subsystem in the Fedora 11 Release?

If you want to know more about the Volume Control, I'd just refer to the Feature page:

https://fedoraproject.org/wiki/Features/VolumeControl

We moved PA 0.9.15 into F11, a nice overview over the new features you can find here:

http://0pointer.de/blog/projects/oh-nine-fifteen.html

However that overview is a bit out-of-date. There are quite a few additional features that went into 0.9.15, most prominently full Bluetooth Audio support: Together with Bastien Nocera and the BlueZ guys I worked to make Bluetooth audio easily accessible -- the bluetooth applet now exposes an easy dialog that allows you to pair and activate a bluetooth headset. After that is done it will automatically appear in PulseAudio. If you need to reactivate it later, you can do that with a simple click in the applet menu. It works surprisingly well. It even works fine for lip-sync video. Which is kind of magic, given that Bluetooth Audio doesn't actually offer any timing interfaces, so syncing up audio with video is not really possible. I spent a lot of time to make sure it does work nonetheless, and it seems to work on the majority of headphones although I cannot say for sure if it does for all of them.

3. Where did the ideas to change all this stuff come from? Didn't audio always work in Fedora?

Depends what you mean by 'work'. Sure, basic audio output worked. But in many ways what we had on Linux was not comparable to what MacOS or Windows supported. And it still isn't in many ways. However in other ways we have now surpassed those competitors.

A lot of the changes we introduced with PA are not directly visible to the user. For example the so called 'glitch-free' logic in PA is very important for a modern audio stack, however the normal user will never notice it -- except maybe because when we introduced it initially a lot of driver bugs got exposed that people were not aware of before because that driver functionality (usually timing related) was not really depended on by any application. In fact even now many of the older drivers expose broken timing that makes usage with PA not as much fun as it could be.

A more detailed explanation of this 'glitch-free' logic you may find here:

http://0pointer.de/blog/projects/pulse-glitch-free.html

Both Windows Vista and MacOS X have similar g-f logic in their audio stacks, however with PA we brought it to the next step. For example, we implemented this logic in a zero-copy fashion and with arbitrary sample types. This allows us to pass PCM data through our pipelines without ever having to copy/convert it unless we really have to.

So yes, as you might have noticed I spend a lot of time to get low-level internals right. And I like to speak about it, even though most people are not aware of all those technical details and how awesome this all is. ;-) That said, this stuff isn't perfect yet and could need more improvements.

But it's not all just in the low-level details. Also on higher levels we got inspired by how our competitors do things. For example the new "flat" volume logic was pioneered in Vista, and we have now adopted a similar logic in PA. It's a great way to reduce the complexities of volume control by 'merging' a few of the sliders in the pipeline. It thus solves the "So which slider is now causing my volume to be too low?" a bit. But also here, there's more work to be done.

It's not all just getting inspired by our competitors. There are a lot of genuinely new features in PA that none of them have (at least to my knowledge). For example, in PA we have 'spatial' event sounds. I.e. if an event sound sound is triggered by a mouse click/dialog at the left side of the screen the sound is generated more from the left speakers, and similar for the right side. This is of course mostly a toy. But I think a useful one ;-) .

Listing all the fancy features PA has would certainly be a bit too much for this interview. So I'll leave it with this... ;-)

Generally, we get inspiration from everywhere. And sure, as long as the most basic music playback was enough for you audio did always work in Fedora. But OTOH, when we started with the integration of all of these new audio features into Fedora two years ago the audio stack was still at a point of what was modern in the 90's. With the new features of the new volume control and PA we are working on bringing Linux audio to what is modern today.

4. Can you also give us a comparison of our new audio framework in reference to other audio frameworks and audio subsystem models that are out there?

There are many frameworks out there. On Free Software systems PA doesn't really have any competitor. Some people think that JACK is one, but it actually is not. JACK is clearly focussed on audio production and not very useful on the desktop otherwise. For example, it is strictly designed to provide very low-latency at the price of power consumption. This is the right thing to do for audio production but not on the general desktop. Logic like 'glitch-free' (see above) makes a lot of sense for the usual desktop audio since it allows flexible adjusting of the latency to what is needed. If used properly it can be used to decrease the interrupt rate to 1/s, while still allowing instant reaction to user input. Since most PCs these days are laptops theses kind of power consumption related features are very important.

One of the current weaker points of Audio on Linux is that we have this clear separation of JACK for audio production and PA for desktop/embedded. Other operating systems have managed to make this a bit smoother by having a single stack for both. This however actually has both advantages and disadvantages.

To improve the situation for now we focussed on making PA and JACK cooperate better. In F11 when JACK needs low-level access to an audio device it will tell PA so and PA will comply and release the device.

This should make switching between the two sound systems easier though of course this is no perfect solution. Given the lack of manpower further integration is unlikely to happen anytime soon -- though both the JACK guys and I seem not generally opposed to something like that.

Now, if you compare our audio stacks with those of the big other operating systems (Windows and MacOSX), then besides the fact that they usually integrate desktop audio and audio production better than we do (as mentioned) there are many things we are better in and many they are better in. We certainly have more flexibility: i.e. depending on your application you can access audio on a lot of different levels: you can access ALSA directly if you need very low-level control, or via PA for desktop level control. You have APIs like GStreamer for media streaming and so on.

This flexibility however translates to more complexity in many ways, and a hodgepodge of API styles. (OTOH Apple's CoreAudio actually isn't as streamlined as many MacOS proponents like us to think.) The documentation for our APIs is usually much worse then theirs. We really need some improvements in that area. Featurewise, PA usually has better networking related features then those counterparts. But there's a lot of features they have right now we lack.

Other Unixes, such as FreeBSD and OpenSolaris are still stuck with OSS (Open Sound System) audio. In F11 we finally switched OSS off by default (though you can still reenable it via some minor hackery). OSS was the predecessor of ALSA. Thankfully it is now fully obsolete on Linux. OSS is mostly a design from the early nineties. It has received only minor updating since then. It is no way comparable to what we now have on Linux or even what MacOS or Windows provide. (Although is has some very vocal fans which like to write me hate mails because I say things like this)

5. This work all started in earlier releases dating all the way to even Fedora 8, if I am correct. How has all this stuff progressed and evolved from then? What was done in previous releases that enabled building upon for this release?

Fedora 8 was the first release where we integrated PA. In Fedora 9 we stabilized PA support. In F10 we integrated the 'glitch-free' logic which turned out to be quite a bumpy ride given that it exposed a lot of timing related driver bugs. In F11 g-f has now been made more robust and most of the more modern audio drivers should now be fixed. Also we have now started to push PA support more into the UI, like with this new volume control.

6. What are the plans for the future, if any, in this particular space in the distro?

I am working on multiple things for F12. Firstly there will be a couple of more low-level changes to PA. The core will be made more threaded. Right now, we run most things in one 'main' thread and do low-level audio I/O in one thread for each audio card. My plan for F12 is to split that one 'main' thread up into as many threads as possible. This should make PA more robust for a couple of operations, and make latencies more reliable.

Then, I am working on considerably beefing up PA's usage of the low-level hardware volume controls. For example, many cards have seperate low-level volume sliders for "Speaker", "Master", "PCM" (and more) that are in the line from the PCM data we stream to the speakers. PA currently exposes only one of those sliders (usually "Master"). My plan is to 'multiply' those sliders and create a single 'product' virtual slider from them that has a better granularity and a larger range. This rework will also introduce input/output switching and probably more.

What has already landed in PA's git repository is support for UPnP A/V. When used in conjunction with Zeeshan Ali's Rygel UPnP MediaServer implementation this allows streaming any application's audio to a any UPnP MediaRenderer (including PS3/Xboxes and all those 'Internet Radio' devices). This is actually pretty neat. Later on we hope to make PA a Media Renderer as as well as a MediaServer. This nicely compliments our current Apple RAOP support.

And there's a lot of other things planned. We'll see how much of that will be ready for F12. I don't like to talk too much about upcoming features and planned code if I don't have anything to show yet, so I'll leave it at this.

And then there's always a little project of mine that is called 'libsydney' that is intended to be a portable, modern and friendly PCM API. During the last months I focussed more on PA itself though.

7. Do you feel that work like this helps enhance the desktop experience on Linux in general and strengthens the cause of the Linux Desktop, or is it more all in day's work?

I think that PA is the way forward for audio on the Linux desktop. It may have its deficiencies -- but everything has. We still have some way to go, but I believe that a modern audio layer is really important for the Linux Desktop to succeed.

And no, it doesn't feel at all in a day's work. It always is a great feeling to see how PA got incorporated into so many distributions and how it is now used by so many people. I am pretty sure that only if you hack on Linux software you get this in this ways.

8. Speaking of all in a days work, what are things do you usually work on? What do you most enjoy doing outside of work.

Red Hat basically hired me to help improving audio on Linux. So that's what I am doing during work.

Outside of work spend my time with photopgraphy. And I am trying my best to travel to interesting places as much as I can and my time off allows.

Thank you Lennart for an excellent interview, ideas and insight. We look forward to hearing more from you. Get it--hearing more, he works on sound, okay I give up.