[pulseaudio-tickets] [Bug 94629] PulseAudio gets reliably killed upon a big number of client connections

Sun Mar 20 19:26:01 UTC 2016

https://bugs.freedesktop.org/show_bug.cgi?id=94629

--- Comment #7 from Ahmed S. Darwish <darwish.07 at gmail.com> ---
After a lot of enlightening discussions with Alex, it seems this
is a well-known problem in Pulse.

For completeness of this bug report, here are the basic points:

1. Linux Audio Conference 2015, "Timing issues in desktop audio
   playback infrastructure", by Alexander
   slides: http://lac.linuxaudio.org/2015/download/rewind-slides.pdf

The issue of unsolicited kills are _clearly_ summarized in slide #13
above: "to process (resample, mix, encode) 2000 ms of sound under
the limited budget of 200ms of real-time. Not easy: on a weak CPU,
a cpufreq-governed CPU, with software DTS encoder, under valgrined,
etc ... Result: KILLED"

Even more details are in the video conference and paper of the same
topic here: http://lac.linuxaudio.org/2015/video.php?id=8

2. A second suggestion is to let PA appropriately program its
   realtime soft limit and install the appropriate SIGXCPU handlers
   in PA. This way, we can be almost sure that the kills are due
   to exceeding our budget.

[ This is also the view favored by kernel developers as they don't
  won't to pollute the kernel logs much.
  http://www.gossamer-threads.com/lists/linux/kernel/1513490#1513490 ]

3. A third and final suggestion is to write some abusive clients
   to demonstrate how common the issue is, and that it's not only
   related to the number of connected clients, but to the issue of
   excessive rewinds and abusive clients in general

"You could write a client that does a lot of rewinds, calls
pa_stream_write with bad timing (e.g. rewinds 990 ms and writes 1s
every 10 ms) and see whether it explodes :) .. I don't expect it to
explode with one client, but two may be enough in your case"

==> Raw discussion log:

<patrakov> darwish: hello. the "realtime budget" problem that you
           reported is actually a known issue for my DTS encoder.
           There, even one stream is enough on typical hardware if
           PulseAudio is left with its default of mixing 2 seconds
           ahead
<darwish>  patrakov, hi :-) .. oh, I see
<darwish>  patrakov, seems it'll need some deep surgery to solve
           this while keeping interrupts low
<patrakov> indeed
<patrakov> and in fact I am on the fence whether to remove the
           low-interrupts feature, as it never worked correctly with
           processing such as resampling
<patrakov> i.e. it may be that we just have to accept the 0.7w hit
<darwish>  hmm
<patrakov> please see http://lac.linuxaudio.org/2015/video.php?id=8
           (slides are enough)
<darwish>  patrakov, slide #13 summarizes everything really nicely :D
<patrakov> I also encourage you to take a look at CRAS source code -
           it has some efficient client-to-server communication method,
           so that the overhead from going down to 28 ms latency is
           only 0.2w, which is IMHO very tolerable and makes rewinds
           (which, together with speculative mixing ahead, are
           responsible for eating the realtime budget in your case)
           unneeded
<darwish>  hmm
<patrakov> basically the current 2000 ms default for the tsched buffer
           is based on the assumption that mixing is cheap, and that
           mixing 2000 ms of ausio should eat no more than 200 ms anyway
<darwish>  patrakov, unless a high amount of clients connect, leading to
           excessive rewinds ..
<patrakov> which is false if the CPU is slowed down by the cpufreq
           framework - it just doesn't see enough load to bump the
           frequency
<darwish>  patrakov, btw thanks a lot! I finally understood the concept
           of rewinding from your slides :D
[...]
<darwish>  hmmm .. "CRAS doesn’t have any of the discussed workarounds"
<patrakov> what was meant is: "CRAS doesn't have any of the discussed
           workarounds and still works fine on hardware found in
           Chromebooks"
<patrakov> no rewinds = no need to guess how much it is possible to
           rewind, no need to deal with non-rewindable ALSA plugins, no
           need to write a rewindable resampler, no correctness issues,
           at the cost of 0.2w of extra power consumed (and if we assume
           that Chrome is the only possible client, then that's 0.0w)
<patrakov> because Chrome never actually uses high latency
<darwish>  just found some slides by the CRAS folks here .. they also
           compare themselves with PA: http://goo.gl/zdmNu4
<patrakov> they indeed share a lot of ideas
[...]
<darwish>  for completeness I'll add excerpts from the discussion above
           to the bug report + links your slides and video conference
<patrakov> basically, I want you to actually write a client that does a
           lot of rewinds, calls pa_stream_write with bad timing (e.g.
           rewinds 990 ms and writes 1s every 10 ms) and see whether it
           explodes :)
<patrakov> I don't expect it to explode with one client, but two may be
           enough in your case
<darwish>  that client would be a nice discussion entry point :-)
<darwish>  I'm now working on some patches for the kernel to inform us
           when it kills PA.. will develop that client, and hopefully
           see how to fix this, afterwards
<patrakov> why do we need those patches?
<patrakov> doesn't the kernel already send SIGXCPU when the soft-limit
           is exceeded?
<patrakov> shouldn't we just set the soft limit correctly in PulseAudio?
<darwish>  it does .. that was the argument too from tglx
<darwish>  patrakov, http://www.serverphorums.com/read.php?12,450582
<patrakov> oh, ok
<darwish>  patrakov, but yeah .. I've asked myself too if it's better to
           just appropriately handle SIGXCPU
<darwish>  so I'm not sure if the kernel devs will accept the patch,
           honestly
<patrakov> on the other hand, can we handle SIGXCPU properly in the case
           when the CPU hog is a DTS encoder? It is not really
           "actionable" upon, other than logging a message.
<patrakov> you can set a flag that says "stop further mixing", but it is
           useless if we are DTS-encoding, not mixing
<darwish>  I can at least log a message in PulseAudio .. so when a user
           submits a bug report with PA killed, and see that message, we
           are 99% sure we've just exceeded our limits
<patrakov> Fair enough
<darwish>  and in that case we won't need the kernel patch I guess ..
[...]
<darwish>  OK I'll go and have some lunch now (and watch the linux audio
           conference video in the process ;-)) .. thanks a lot for this
           discussion, I've learned a lot :-)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/pulseaudio-bugs/attachments/20160320/97e5e285/attachment-0001.html>