[pulseaudio-discuss] User volume vs. Application volume

Thu May 24 08:33:43 PDT 2012

On Thu, 2012-05-24 at 09:28 -0500, Andrew Eikum wrote:
> Hi folks,
> 
> I've been thinking about this issue for a while and I'd like to share
> my thoughts and receive opinions back.
> 
> While developing a PulseAudio driver for Wine, I ran into the issue of
> implementing the volume control interfaces. Microsoft's APIs allow the
> application to make calls that set the stream volume. However, the
> Windows 7 mixer /also/ allows the user to set stream volumes. These
> two volumes are entirely independent of each other.
> 
> I think this makes complete sense. For example, you might have a music
> player which implements a crossfade between tracks. The obvious way to
> implement this to open two streams, and as the first stream reaches
> its last 5 seconds, you fade its volume down and simultaneously start
> and increase the volume for the second stream. The user can then set
> the music player's volume at 50% relative to their chat program, and
> everything works as expected.
> 
> Crossfading in this manner works fine in PulseAudio, but it conflicts
> with the user's volume settings. When the music player starts setting
> the volumes for the purposes of crossfading, it overrides the user's
> settings, potentially deafening them[1].
> 
> This actually extends beyond volume control. Both of the music
> player's streams will appear in pavucontrol, when they really
> represent a single logical audio channel. I would expect some API to
> combine those into a single user-facing stream, akin to Microsoft's
> AudioSessions[2]. But this is a separate-but-related issue and can be
> resolved later.
> 
> Perhaps this goes beyond the scope of PulseAudio, in that PA shouldn't
> be used as an application mixer. If that's the case, PA's APIs don't
> make it obvious enough, as we have actual applications that use the
> volume APIs to implement mixing (see [1], [3], [4]). That said, I
> think this is a reasonably small service to provide to applications.
> There's already a long history of audio APIs providing
> application-level mixing, and now we have the opportunity to provide
> both application-level /and/ user-facing volume control.
> 
> I don't understand flat volumes, so I haven't accounted for them at
> all in the preceding discussion.
> 
> I also haven't looked much at implementing this. We would need to
> modify PulseAudio's volume APIs to make it clear whether the
> application is trying to set the user volume (for mixer applications
> like pavucontrol) or the application volume. PulseAudio's volume APIs
> are already pretty humongous and confusing, so this might be tricky.
> 
> Do others agree with my analysis? Is this something that should be
> included in PulseAudio? Any tips for how to design the APIs for this?

I agree that Pulseaudio should provide facilities for fading. We could
insist that applications do such processing themselves (like is
currently required), but the problem is that it's a non-trivial thing
with large buffers when the fading should begin immediately upon user
input - it requires the application to understand the concept of
"rewinding" (going back in time and rewriting already written data). I
think fading is so common feature that it makes sense to save the
duplicate work of implementing the complex audio rewriting logic in
every application by doing it in Pulseaudio.

I don't think using an "application volume" as in Windows is the right
solution for fades, though. If it works so that the application sends
volume change commands at a high rate, it will cause rewinding at the
server end at the same rate, which is unnecessarily wasteful and may
possibly cause even user-visible performance problems. I think
applications should just tell when a fade should start and how long it
should last.

That said, an "application volume" may be useful for other purposes
(replay gain is the only concrete use case that I have in mind), and I
have already filed a feature request for it myself:
https://bugs.freedesktop.org/show_bug.cgi?id=39556

I haven't heard the idea about merging cross-faded streams into one
user-visible stream before. It sounds pretty sensible to me.

Since you wanted to hear tips about API design for this stuff, are you
perhaps planning to implement this in Pulseaudio yourself? That would be
great - I don't think this will otherwise get done in any timely manner.

-- 
Tanu