<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - PulseAudio gets reliably killed upon a big number of client connections"
href="https://bugs.freedesktop.org/show_bug.cgi?id=94629#c7">Comment # 7</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - PulseAudio gets reliably killed upon a big number of client connections"
href="https://bugs.freedesktop.org/show_bug.cgi?id=94629">bug 94629</a>
from <span class="vcard"><a class="email" href="mailto:darwish.07@gmail.com" title="Ahmed S. Darwish <darwish.07@gmail.com>"> <span class="fn">Ahmed S. Darwish</span></a>
</span></b>
<pre>After a lot of enlightening discussions with Alex, it seems this
is a well-known problem in Pulse.
For completeness of this bug report, here are the basic points:
1. Linux Audio Conference 2015, "Timing issues in desktop audio
playback infrastructure", by Alexander
slides: <a href="http://lac.linuxaudio.org/2015/download/rewind-slides.pdf">http://lac.linuxaudio.org/2015/download/rewind-slides.pdf</a>
The issue of unsolicited kills are _clearly_ summarized in slide #13
above: "to process (resample, mix, encode) 2000 ms of sound under
the limited budget of 200ms of real-time. Not easy: on a weak CPU,
a cpufreq-governed CPU, with software DTS encoder, under valgrined,
etc ... Result: KILLED"
Even more details are in the video conference and paper of the same
topic here: <a href="http://lac.linuxaudio.org/2015/video.php?id=8">http://lac.linuxaudio.org/2015/video.php?id=8</a>
2. A second suggestion is to let PA appropriately program its
realtime soft limit and install the appropriate SIGXCPU handlers
in PA. This way, we can be almost sure that the kills are due
to exceeding our budget.
[ This is also the view favored by kernel developers as they don't
won't to pollute the kernel logs much.
<a href="http://www.gossamer-threads.com/lists/linux/kernel/1513490#1513490">http://www.gossamer-threads.com/lists/linux/kernel/1513490#1513490</a> ]
3. A third and final suggestion is to write some abusive clients
to demonstrate how common the issue is, and that it's not only
related to the number of connected clients, but to the issue of
excessive rewinds and abusive clients in general
"You could write a client that does a lot of rewinds, calls
pa_stream_write with bad timing (e.g. rewinds 990 ms and writes 1s
every 10 ms) and see whether it explodes :) .. I don't expect it to
explode with one client, but two may be enough in your case"
==> Raw discussion log:
<patrakov> darwish: hello. the "realtime budget" problem that you
reported is actually a known issue for my DTS encoder.
There, even one stream is enough on typical hardware if
PulseAudio is left with its default of mixing 2 seconds
ahead
<darwish> patrakov, hi :-) .. oh, I see
<darwish> patrakov, seems it'll need some deep surgery to solve
this while keeping interrupts low
<patrakov> indeed
<patrakov> and in fact I am on the fence whether to remove the
low-interrupts feature, as it never worked correctly with
processing such as resampling
<patrakov> i.e. it may be that we just have to accept the 0.7w hit
<darwish> hmm
<patrakov> please see <a href="http://lac.linuxaudio.org/2015/video.php?id=8">http://lac.linuxaudio.org/2015/video.php?id=8</a>
(slides are enough)
<darwish> patrakov, slide #13 summarizes everything really nicely :D
<patrakov> I also encourage you to take a look at CRAS source code -
it has some efficient client-to-server communication method,
so that the overhead from going down to 28 ms latency is
only 0.2w, which is IMHO very tolerable and makes rewinds
(which, together with speculative mixing ahead, are
responsible for eating the realtime budget in your case)
unneeded
<darwish> hmm
<patrakov> basically the current 2000 ms default for the tsched buffer
is based on the assumption that mixing is cheap, and that
mixing 2000 ms of ausio should eat no more than 200 ms anyway
<darwish> patrakov, unless a high amount of clients connect, leading to
excessive rewinds ..
<patrakov> which is false if the CPU is slowed down by the cpufreq
framework - it just doesn't see enough load to bump the
frequency
<darwish> patrakov, btw thanks a lot! I finally understood the concept
of rewinding from your slides :D
[...]
<darwish> hmmm .. "CRAS doesn’t have any of the discussed workarounds"
<patrakov> what was meant is: "CRAS doesn't have any of the discussed
workarounds and still works fine on hardware found in
Chromebooks"
<patrakov> no rewinds = no need to guess how much it is possible to
rewind, no need to deal with non-rewindable ALSA plugins, no
need to write a rewindable resampler, no correctness issues,
at the cost of 0.2w of extra power consumed (and if we assume
that Chrome is the only possible client, then that's 0.0w)
<patrakov> because Chrome never actually uses high latency
<darwish> just found some slides by the CRAS folks here .. they also
compare themselves with PA: <a href="http://goo.gl/zdmNu4">http://goo.gl/zdmNu4</a>
<patrakov> they indeed share a lot of ideas
[...]
<darwish> for completeness I'll add excerpts from the discussion above
to the bug report + links your slides and video conference
<patrakov> basically, I want you to actually write a client that does a
lot of rewinds, calls pa_stream_write with bad timing (e.g.
rewinds 990 ms and writes 1s every 10 ms) and see whether it
explodes :)
<patrakov> I don't expect it to explode with one client, but two may be
enough in your case
<darwish> that client would be a nice discussion entry point :-)
<darwish> I'm now working on some patches for the kernel to inform us
when it kills PA.. will develop that client, and hopefully
see how to fix this, afterwards
<patrakov> why do we need those patches?
<patrakov> doesn't the kernel already send SIGXCPU when the soft-limit
is exceeded?
<patrakov> shouldn't we just set the soft limit correctly in PulseAudio?
<darwish> it does .. that was the argument too from tglx
<darwish> patrakov, <a href="http://www.serverphorums.com/read.php?12,450582">http://www.serverphorums.com/read.php?12,450582</a>
<patrakov> oh, ok
<darwish> patrakov, but yeah .. I've asked myself too if it's better to
just appropriately handle SIGXCPU
<darwish> so I'm not sure if the kernel devs will accept the patch,
honestly
<patrakov> on the other hand, can we handle SIGXCPU properly in the case
when the CPU hog is a DTS encoder? It is not really
"actionable" upon, other than logging a message.
<patrakov> you can set a flag that says "stop further mixing", but it is
useless if we are DTS-encoding, not mixing
<darwish> I can at least log a message in PulseAudio .. so when a user
submits a bug report with PA killed, and see that message, we
are 99% sure we've just exceeded our limits
<patrakov> Fair enough
<darwish> and in that case we won't need the kernel patch I guess ..
[...]
<darwish> OK I'll go and have some lunch now (and watch the linux audio
conference video in the process ;-)) .. thanks a lot for this
discussion, I've learned a lot :-)</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>