[pulseaudio-discuss] [PATCH] add xrdp sink
Alexander E. Patrakov
patrakov at gmail.com
Sun Jun 1 07:28:57 PDT 2014
31.05.2014 02:05, I wrote:
> The claimed deficiencies of the esound sink are high latency and even
> worse latency estimation, i.e. a/v sync issues. However, there is
> something strange (possible bug, found by code inspection, I have not
> tested anything yet) in module-esound-sink.c. It creates a socket,
> relies on the socket buffer becoming full for throttling the sink
> rendering process, but never sets the SO_SNDBUF option, either directly
> or through the helper from pulsecore/socket-util.c. And the default is
> more than 256 KB! So no wonder that the socket accumulates a lot of
> sound data (and thus latency) before throttling.
>
> As for the bad latency estimation, I think this applies only to
> networked connections. Indeed, the esound protocol has a request for
> querying the server-internal latency, and PulseAudio issues it. The
> total latency consists of the amount of the sound data buffered in the
> esound server, the network, and locally in the client. The only unknown
> here is the network: the server-internal latency can be queried, and the
> amount of locally-buffered data is known via SIOCOUTQ. But for local
> connections, the amount of data buffered by the network is zero, so this
> criticism also seems unfounded in the XRDP case.
Yesterday and today I played with sockets and also with the real esd,
and here is the degree to which the criticisms above are valid. Summary:
even if xrdp implements every aspect of the esound protocol perfectly,
we won't be able to get latency below 25 ms (4480 bytes) for CD-format
(44100 Hz, 16 bits, stereo) samples, and that would require, at the
PulseAudio side, to work around a server-side bug of the real esd. As
the original patch submission effectively stated, by its code, that the
30 ms latency is good enough, I guess that the 25 ms limitation is not a
showstopper for CD-format samples. But the 4480-byte latency can be
somewhat problematic for lower-quality formats.
The esound protocol, as I have already said, relies on the socket
buffers becoming full as the means of synchronization. This means that
the minimum achievable latency is directly related to the minimum socket
buffer size. If I set the buffer to 1 byte, the kernel bumps it to the
real minimum:
SO_RCVBUF -> 2304
SO_SNDBUF -> 4608
OK. So let's create a unix-domain socket, bind it to /tmp/demo.sock, set
these buffer sizes, accept a connection, and don't read anything. It is
expected that the client will be able to write some limited amount of
data to the socket before it gets blocking. This is very easy to measure
by making the client socket non-blocking and writing data there.
In my experiment, with the minimal buffer sizes both on the client and
on the server, I was able to write 4480 bytes there. I am not able to
relate this to the numbers above - but maybe I shouldn't. In any case,
this number (4480 bytes) determines the minimum latency achievable in
any setup that relies on blocking when the unix-domain socket buffer
becomes full. For typical CD-format samples, this means that the
theoretical minimum latency is 25.4 ms.
Then, let's see how PulseAudio's estimation of the queue length works
here. It uses the SIOCOUTQ ioctl, and in my case, it returns 8704. Which
is nonsense (in other words, kernel bug), especially since the other end
can receive only 4480 bytes.
Just for fun, I have repeated this test using regular TCP sockets over a
wi-fi link. The minimum buffer sizes are the same. I was able to send
1152 bytes and then 1152 bytes more before getting EAGAIN. At that
point, SIOCOUTQ said that 1152 bytes are buffered locally. Well, that's
more sane than in the unix-domain-socket case (it can be interpreted as
"1152 bytes are buffered locally and 1152 bytes must be buffered
remotely", which matches the traffic dump, 1152 being the TCP window
size), but still fails to account for the remote buffer, and I don't
know how to explain this value in terms of SO_{SND,RCV}BUF and manual pages.
With a bigger SO_SNDBUF value, both in the TCP and in the unix-domain
case, I am able to "send" more before the socket gets blocked. In the
TCP case, SIOCOUTQ correctly indicates that the bytes get actually
queued locally. In the unix-domain socket case, its result also
increases, but (with the minimal buffers on the receiving side) remains
off by approximately 4k bytes from what I would expect.
Unfortunately, we can't just set the send buffer size to the minimum,
because that would break communication with the real esd. The problem is
in its read_player() function:
if (actual < player->buffer_length - player->actual_length)
break;
I.e., on any partial read (which is going to happen if the sender uses a
small buffer), the contents are just thrown out and not mixed. The
typical read size is 4096 bytes, but can in pathological situations (OSS
on a bad card) be up to 86016 bytes. By the way, jesd-0.0.7 (from 2000)
does not have this bug. To work around the bug, we need to use
pa_sink_render_full(), so that the data is written using as few packets
as possible, and a compatible send buffer size.
The minimum buffer size that doesn't trigger the bug can be estimated
from the latency report provided by esd. Also, we can omit the
workaround for unix-domain sockets, as nobody is going to run the real
esd on the same local machine as PulseAudio.
--
Alexander E. Patrakov
More information about the pulseaudio-discuss
mailing list