[libnice] Using (abusing?) libnice's event loop for outgoing traffic

Lorenzo Miniero lminiero at gmail.com
Fri May 11 17:41:17 UTC 2018


Hi Olivier,

this is what we did originally, but we soon had to change the pattern.
In fact, it's not just a nice_agent_send call, as for each outgoing
packet we have to do an SRTP encryption, and the same source can
actually send to many destinations (hundreds) at the same time: this
did end up being blocking or a source of severe delays for the media
plugins in Janus, for which sending a packet is supposed to be
"asynchronous" no matter how many recipients there are; it also caused
lost packets on the incoming side, as the original trigger for
incoming packets in Janus is the libnice recv callback, which means
that sending to hundreds of recipients was done on the libnice loop
thread, and that could take forever. Besides, libsrtp is not thread
safe, and having a unique point where SRTP encryption would happen
avoided the need for a lock.

This is why we ended up with a dedicated thread per participant, which
is fine, but I was investigating for a way to only use a single thread
for both in and out. Do you know if there's anything in libnice,
configurable or not, that may contribute to the problems I'm
experiencing? I tried looking at the code, but all I could find was
that apparently g_socket_create_source is used for incoming media, and
it's not clear from the documentation how it works internally (e.g.,
in terms of polling, how much it polls before giving up, etc.).

Lorenzo


2018-05-11 19:31 GMT+02:00 Olivier CrĂȘte <olivier.crete at collabora.com>:
> Hi,
>
> I would just get rid of the send thread and the GASyncQueue completely.
>   Sending in libnice is non-blocking, so you should be able to just
> call the nice_agent_send() at the place where you would normally put
> the things in a queue.
>
> Olivier
>
> On Wed, 2018-05-09 at 15:50 +0200, Lorenzo Miniero wrote:
>> Hi all,
>>
>> as you may know, I'm using libnice in Janus, a WebRTC server. The way
>> it's used right now is with two different threads per ICE agent: one
>> runs the agent's GMainContext+GMainLoop, and as such is responsible
>> for notifying the application about incoming events and packets via
>> the libnice callback; another thread handles outgoing traffic, with a
>> GAsyncQueue to queue packets, prepare them and shoot them out via
>> nice_agent_send() calls.
>>
>> These past few days I've been playing with an attempt to actually put
>> those two activities together in a single thread, which would
>> simplify
>> things (e.g., in terms of locking and other things) and, ideally,
>> optimize resources (we'd only spawn half the threads we do now). To
>> do
>> so, I decided to try and re-use the agent's event loop for that, and
>> followed an excellent blog post by Philip Withnall to create my own
>> GSource for the purpose:
>> https://tecnocode.co.uk/2015/05/05/a-detailed-look-at-gsource/
>> This was very helpful, as I ended up doing something very similar,
>> since I was already using a GAsyncQueue for outgoing media myself.
>>
>> Anyway, while this "works", the outgoing media is quite delayed when
>> rendered by the receiving browser. I verified that media from clients
>> does come in at the correct rate and with no delays there, by
>> redirecting the traffic to a monitoring gstreamer pipeline, which
>> means somehow the outgoing path is to blame. I "tracked" packets both
>> using wireshark and log lines (e.g., when they were queued and when
>> they were dispatched by the GSource), and did notice that some
>> packets
>> are handled fast, while others much later, and there are "holes" when
>> no packet is sent (for ~500ms at times).  This shouldn't be the code
>> failing to keep up with the things to do in time (since there's SRTP,
>> there's encryption involved for every packet), as the CPU usage is
>> always quite low.
>>
>> At first I thought this could be ascribed to GSource priorities, but
>> even playing with those nothing changed. One thing I noticed, though,
>> was that the delay was quite more apparent in sendonly connections
>> than in sendrecv ones. This means that, while in an EchoTest demo it
>> is barely noticeable (even though still worse than with the two
>> threads), in the VideoRoom demo, where you have monodirectional
>> agents
>> (some used to just publish media, others just to subscribe), it is
>> much more evident (~600ms). Considering that the publisher's media is
>> fine (as confirmed by the gstreamer monitor mentioned above), the
>> only
>> explanation I came up with was that, while in a bidirectional
>> communication there's a lot of incoming traffic, on a subscriber's
>> connection you only get occasional RTCP packets from time to time. In
>> case there's some sort of timed poll deep in the libnice code, due to
>> a single thread processing both directions this might mean having the
>> event loop kept busy waiting for a file descriptor to generate an
>> event, while outgoing packets pile up in the queue.
>>
>> Does this make sense, and might this indeed be what's happening? Is
>> what I'm trying to do indeed feasible or are there better ways to try
>> and do it properly, e.g., via nice_agent_recv_messages instead of the
>> recv callback? This is my first attempt at using Glib events in a
>> more
>> "conversational" way than for sporadic events, and so I realize I may
>> be doing some rookie mistakes here.
>>
>> Thanks!
>> Lorenzo
>> _______________________________________________
>> nice mailing list
>> nice at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nice
> --
> Olivier CrĂȘte
> olivier.crete at collabora.com


More information about the nice mailing list