[libnice] Using (abusing?) libnice's event loop for outgoing traffic

Olivier Crête olivier.crete at collabora.com
Fri May 11 17:50:50 UTC 2018


Hi,

I have no idea what is going on with your code. Are you doing any
blocking processing the the code that comes out of the recv command?
You know that you can attach the receive callback to a separate thread
each if required?

If three is a deadlock and it blocks forever while calling send from
the recv callback, I'd be very interested in a stack trace, as this is
definitely supposed to work.

All that g_socket_create_source() does it add the fd to the list of FDs
to be given to the blocking poll() system call (which is the heart of
g_main_context_iterate() which is called repeatedly by
g_main_loop_run()).

Olivier

On Fri, 2018-05-11 at 19:41 +0200, Lorenzo Miniero wrote:
> Hi Olivier,
> 
> this is what we did originally, but we soon had to change the
> pattern.
> In fact, it's not just a nice_agent_send call, as for each outgoing
> packet we have to do an SRTP encryption, and the same source can
> actually send to many destinations (hundreds) at the same time: this
> did end up being blocking or a source of severe delays for the media
> plugins in Janus, for which sending a packet is supposed to be
> "asynchronous" no matter how many recipients there are; it also
> caused
> lost packets on the incoming side, as the original trigger for
> incoming packets in Janus is the libnice recv callback, which means
> that sending to hundreds of recipients was done on the libnice loop
> thread, and that could take forever. Besides, libsrtp is not thread
> safe, and having a unique point where SRTP encryption would happen
> avoided the need for a lock.
> 
> This is why we ended up with a dedicated thread per participant,
> which
> is fine, but I was investigating for a way to only use a single
> thread
> for both in and out. Do you know if there's anything in libnice,
> configurable or not, that may contribute to the problems I'm
> experiencing? I tried looking at the code, but all I could find was
> that apparently g_socket_create_source is used for incoming media,
> and
> it's not clear from the documentation how it works internally (e.g.,
> in terms of polling, how much it polls before giving up, etc.).
> 
> Lorenzo
> 
> 
> 2018-05-11 19:31 GMT+02:00 Olivier Crête <olivier.crete at collabora.com
> >:
> > Hi,
> > 
> > I would just get rid of the send thread and the GASyncQueue
> > completely.
> >   Sending in libnice is non-blocking, so you should be able to just
> > call the nice_agent_send() at the place where you would normally
> > put
> > the things in a queue.
> > 
> > Olivier
> > 
> > On Wed, 2018-05-09 at 15:50 +0200, Lorenzo Miniero wrote:
> > > Hi all,
> > > 
> > > as you may know, I'm using libnice in Janus, a WebRTC server. The
> > > way
> > > it's used right now is with two different threads per ICE agent:
> > > one
> > > runs the agent's GMainContext+GMainLoop, and as such is
> > > responsible
> > > for notifying the application about incoming events and packets
> > > via
> > > the libnice callback; another thread handles outgoing traffic,
> > > with a
> > > GAsyncQueue to queue packets, prepare them and shoot them out via
> > > nice_agent_send() calls.
> > > 
> > > These past few days I've been playing with an attempt to actually
> > > put
> > > those two activities together in a single thread, which would
> > > simplify
> > > things (e.g., in terms of locking and other things) and, ideally,
> > > optimize resources (we'd only spawn half the threads we do now).
> > > To
> > > do
> > > so, I decided to try and re-use the agent's event loop for that,
> > > and
> > > followed an excellent blog post by Philip Withnall to create my
> > > own
> > > GSource for the purpose:
> > > https://tecnocode.co.uk/2015/05/05/a-detailed-look-at-gsource/
> > > This was very helpful, as I ended up doing something very
> > > similar,
> > > since I was already using a GAsyncQueue for outgoing media
> > > myself.
> > > 
> > > Anyway, while this "works", the outgoing media is quite delayed
> > > when
> > > rendered by the receiving browser. I verified that media from
> > > clients
> > > does come in at the correct rate and with no delays there, by
> > > redirecting the traffic to a monitoring gstreamer pipeline, which
> > > means somehow the outgoing path is to blame. I "tracked" packets
> > > both
> > > using wireshark and log lines (e.g., when they were queued and
> > > when
> > > they were dispatched by the GSource), and did notice that some
> > > packets
> > > are handled fast, while others much later, and there are "holes"
> > > when
> > > no packet is sent (for ~500ms at times).  This shouldn't be the
> > > code
> > > failing to keep up with the things to do in time (since there's
> > > SRTP,
> > > there's encryption involved for every packet), as the CPU usage
> > > is
> > > always quite low.
> > > 
> > > At first I thought this could be ascribed to GSource priorities,
> > > but
> > > even playing with those nothing changed. One thing I noticed,
> > > though,
> > > was that the delay was quite more apparent in sendonly
> > > connections
> > > than in sendrecv ones. This means that, while in an EchoTest demo
> > > it
> > > is barely noticeable (even though still worse than with the two
> > > threads), in the VideoRoom demo, where you have monodirectional
> > > agents
> > > (some used to just publish media, others just to subscribe), it
> > > is
> > > much more evident (~600ms). Considering that the publisher's
> > > media is
> > > fine (as confirmed by the gstreamer monitor mentioned above), the
> > > only
> > > explanation I came up with was that, while in a bidirectional
> > > communication there's a lot of incoming traffic, on a
> > > subscriber's
> > > connection you only get occasional RTCP packets from time to
> > > time. In
> > > case there's some sort of timed poll deep in the libnice code,
> > > due to
> > > a single thread processing both directions this might mean having
> > > the
> > > event loop kept busy waiting for a file descriptor to generate an
> > > event, while outgoing packets pile up in the queue.
> > > 
> > > Does this make sense, and might this indeed be what's happening?
> > > Is
> > > what I'm trying to do indeed feasible or are there better ways to
> > > try
> > > and do it properly, e.g., via nice_agent_recv_messages instead of
> > > the
> > > recv callback? This is my first attempt at using Glib events in a
> > > more
> > > "conversational" way than for sporadic events, and so I realize I
> > > may
> > > be doing some rookie mistakes here.
> > > 
> > > Thanks!
> > > Lorenzo
> > > _______________________________________________
> > > nice mailing list
> > > nice at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/nice
> > 
> > --
> > Olivier Crête
> > olivier.crete at collabora.com
-- 
Olivier Crête
olivier.crete at collabora.com


More information about the nice mailing list