[pulseaudio-discuss] [PATCH] add xrdp sink
Alexander E. Patrakov
patrakov at gmail.com
Fri May 30 13:05:10 PDT 2014
30.05.2014 18:43, Tanu Kaskinen wrote:
>> 2 - esound sink and source as Alexander suggests(source not complete).
>> 3 - RTP over unix domain socket(module-rtp-send not complete as
>> Laurentiu Nicola says).
>> I'm ok with 2 or 3, but I want to make sure it's the best decision
>> long term. I think there will be a lot of users using PA this way.
> I don't know the details of any of the three protocols (custom xrdp,
> esound or rtp), so I don't have any opinions like "you really should use
> X" or "you really shouldn't use Y".
OK, here are some bad words about the protocols.
The main reason why I am currently against the current custom protocol is:
Any custom protocol will likely evolve, and, with the current inability
to build out-of-tree modules, it means that future versions of both xrdp
and PulseAudio will have to deal somehow with any resulting version
mismatch. The current protocol doesn't provide any versioning, though,
and that's a problem _if_ the custom protocol (as opposed to a suitable
but set-in-stone standard protocol) is accepted as the way forward.
The second reason was (see below for factors that amend it):
The current custom protocol is essentially a copy of the esound protocol
with minor variations. All criticisms that apply to module-esound-sink
will also apply to the current module-xrdp-sink. Conversely, if any
current criticisms on module-esound-sink actually don't apply in this
use case to module-xrdp-sink, then they are irrelevant for
...which Tanu has worded in a more positive way:
> If the esound protocol "deficiencies" (that I'm not familiar with) don't
> really matter in case of XRDP, and there's not a lot of mandatory extra
> cruft in the protocol that isn't necessary with XRDP, then reusing the
> esound protocol sounds like a good idea.
Note that I don't propose to implement the whole esound protocol - just
enough to interoperate with PulseAudio and maybe the most common clients.
The claimed deficiencies of the esound sink are high latency and even
worse latency estimation, i.e. a/v sync issues. However, there is
something strange (possible bug, found by code inspection, I have not
tested anything yet) in module-esound-sink.c. It creates a socket,
relies on the socket buffer becoming full for throttling the sink
rendering process, but never sets the SO_SNDBUF option, either directly
or through the helper from pulsecore/socket-util.c. And the default is
more than 256 KB! So no wonder that the socket accumulates a lot of
sound data (and thus latency) before throttling.
As for the bad latency estimation, I think this applies only to
networked connections. Indeed, the esound protocol has a request for
querying the server-internal latency, and PulseAudio issues it. The
total latency consists of the amount of the sound data buffered in the
esound server, the network, and locally in the client. The only unknown
here is the network: the server-internal latency can be queried, and the
amount of locally-buffered data is known via SIOCOUTQ. But for local
connections, the amount of data buffered by the network is zero, so this
criticism also seems unfounded in the XRDP case.
Now let's compare the protocols.
As Tanu has already mentioned, there is an important difference between
the custom protocol and the esound protocol. Namely, the clock source.
module-esound-sink uses the remote clock source: it writes to the socket
as quickly as possible until its buffer fills up, and unblocks when
esound (or xrdp) reads some data out. module-xrdp-sink uses the local
clock to move samples to the socket (sleep, write, sleep, write, and so
on), and assumes that xrdp will read the samples out quickly enough so
that the writes never block.
I do not know what provides this guarantee. For it to be true, there
should be "something" somewhere that measures the rate at which the
sound samples are arriving, and compensates for the clock drift between
the local system and the remote sound card. I.e. let's suppose that the
remote system thinks that the fragments being sent out are 29.99 ms
apart, and not 30 ms as the local system thinks. The difference will
accumulate, and, unless some samples are dropped or the stream is
resampled by a factor of 30/29.99, there will be something like a
blocked socket or overfilled buffer. The same "need to have an adaptive
resampler" problem apples to RTP or to any other protocol that relies on
the local clock.
If the wanted semantics is "remote soundcard clock is the master clock",
then the esound protocol will be suitable. If "local clock is the master
clock" is actually wanted, then any of the three protocols would somehow
work (and with esound protocol, the local clock would be inside xrdp
Now let's turn to protocol elements.
The custom protocol has an explicit opcode for pausing the stream. This
was one of the reasons that lead to its creation. I don't know yet
whether PulseAudio would suspend the esound-protocol stream, but if
necessary, this could be added. The possible implementation alternatives
are to either disconnect until it has something else to play (which
PulseAudio certainly does not do), or to simply stop the data flow
(which I have to test yet). In the second case, xrdp could detect the
pause by observing that it can read nothing out of the socket for a
sufficiently long time.
The esound protocol has only three protocol elements that one would need
to implement in xrdp: cookie-based authentication, latency request and
audio stream playback. Cookie-based authentication is stupid but easy,
so should not be a problem. Latency request is actually a good thing, it
allows PulseAudio to report to the client how long it would take tor the
last-written sample to reach the playback device. Without this request
(e.g. with the original custom protocol) or any other way to query or
influence the latency, a/v synchronization is impossible. And audio
stream playback means just taking audio samples from the socket when
they are needed (but not earlier than that). So it should all be quite
RTP is a unidirectional packet-based protocol. As such, it does not have
any way to query the latency. It does not have any useful way to
influence the latency at the receiver, either. As such, PulseAudio does
not have any means for offering accurate latency reports, and a/v
synchronization is impossible.
The RTP protocol elements that are not repeated between packets, besides
the actual audio data, are the packet sequence number and the timestamp.
In the xrdp case the sequence number is probably not interesting, as it
just increases for each packet by one. It can be useful for packet loss
detection, but packets are not lost in a unix-domain socket if they are
read out of the socket in a timely manner. The timestamp starts from 0
and is incremented by 1 for each audio sample. It is useful for
reconstructing the exact duration of silence represented by not
transmitting any packets. Its relation to the wall clock is conveyed in
the SDP announced via the SAP port, by means of the NTP-style timestamp
of the start of the transmission, with one-second precision. So this is
not useful for determining when exactly, according to the wall clock,
this packet should be played.
Based on the above, I think that among the three protocols discussed,
the esound protocol, if any (this is important!), is the way to go.
Alexander E. Patrakov
More information about the pulseaudio-discuss