[libnice] Sending packets uses a static, global lock?

Olivier Crête olivier.crete at collabora.com
Thu Oct 13 14:44:28 UTC 2016


Hi,

The lock should just be per agent. The global lock was a workaround for
the crash that happens when the timeout GSource fires off while the
object is being destroyed and then has an invalid agent pointer in the
callback. Possibly we just need to use a GWeakRef for the agent in all
of the callbacks, possibly by doing a wrapper
using agent_timeout_add_with_context(). And then we can also remove the
"if (g_source_is_destroyed (g_main_current_source ())) {" everywhere
and centralize it all in the wrapper...

Olivier

On Thu, 2016-10-13 at 11:10 +0200, Lorenzo Miniero wrote:
> > 2016-10-13 6:46 GMT+02:00 Anand Sivaram <aspnair at gmail.com>:
> > What about having a global flag for the lock type?
> > Set it with g_object_set during the library initialization.
> > It should have 3 values - Off, Per Agent, Global.
> > Modify agent_lock() and agent_unlock() accordingly.
> > 
> > > > That way if the applications using libnice implemented their own
locking
> > mechanism, that could be used seamlessly.
> > 
> > 
> > 
> > 
> 
> > > > > > > That looks like a smart approach, especially as it would be flexible:
the off part is particularly interesting, as I have locks all over
the place myself! :-). I'm still concerned about whether or not this
can result in issues with the glib2 stuff libnice uses internally,
though, and that applications don't have access to? There may be
private methods invoked that need that locking, something the
application from outside may not be able to protect.
> 
> Lorenzo
> 
> 
>  
> > anand
> > 
> > 
> > 
> > 
> > 
> > > > > > On 12 October 2016 at 21:35, Lorenzo Miniero <lminiero at gmail.com>
wrote:
> > > > > > Thanks for the context! (and sorry for the delay in this
response...)
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm not familiar enough with glib2, unfortunately: considering
the patch dates back to about 7 years ago, is the problem the
global muxed fixed still an issue with current versions? We did a
quick and dirty test by turning those global agent locks to per-
agent locks, where possible (that is, in methods that actually
provided a pointer to an agent instance) and we did notice
improvements in the performance (about twice as many sessions
before it started to break down), but of course this means little
if it risks causing those race conditions and crashes again. Is
this road worth exploring? I'm willing to help and contribute to
such work, of course, although I admit I'm not an expert on the
libnice internals (but it's never too late to learn! ;-) ).
> > > 
> > > Thanks,
> > > Lorenzo
> > > 
> > > 
> > > 
> > > > > > 2016-10-05 13:09 GMT+02:00 Miguel París Díaz <mparisdiaz at gmail.co
m>:
> > > > Sorry for the copy-paste error.
> > > > 
> > > > 
> > > > > > > > 2016-10-05 13:06 GMT+02:00 Miguel París Díaz <mparisdiaz at gmail.
com>:
> > > > > Good point Lorenzo!!!
> > > > > This change was added in this commit:
> > > > > 
> > > > > commit 2faa76a4840100bebd9532bd47a2755874c5b3a0
> > > > > > > > > > Author: Youness Alaoui <youness.alaoui at collabora.co.uk>
> > > > > Date:   Tue Jun 16 20:40:33 2009 -0400
> > > > > 
> > > > > > > > > > > > > > >     Use a global mutex for all nice agents and use
g_source_is_destroyed to avoid the race condition of a source
being destroyed while we wait for the mutex
> > > > > 
> > > > > And then the RecMutex was changed by a Mutex in:
> > > > > 
> > > > > 
> > > > commit 1deee69325284c726c3a8380a7d5839a51e20c48
> > > > > > > > Author: Olivier Crête <olivier.crete at collabora.com>
> > > > Date:   Mon Feb 24 18:51:31 2014 -0500
> > > > 
> > > >     agent: Replace recursive mutex with non-recursive
> > > >     
> > > > > > > >     This should prevent us from re-adding re-entrancy in the
future
> > > > 
> > > >  
> > > > > 
> > > > > 
> > > > > 
> > > > > > > > > > 2016-10-05 11:45 GMT+02:00 Lorenzo Miniero <lminiero at gmail.co
m>:
> > > > > 
> > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > we've been working on some stress tests of our Janus WebRTC
server, which uses libnice for all the ICE related stuff.
While it works great in general, we reach a wall much
sooner than we expect, and performance start to degrade
quite a bit. By doing some digging and diagnostic in the
code we wrote and the one we use, we found out something we
thought was weird.
> > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Specifically, we noticed that the internal method that is
used to send packets,
nice_agent_send_messages_nonblocking (which nice_agent_send
itself uses internally) has a lot of code wrapped in an
agent_lock()/agent_unlock(), which locks/unlocks a global
mutex, and this includes the process of physiclly sending
the packet. This means that the whole library is locked
until that code is called, which seems overkill. I guess
the same happens in other parts of the code as well. I'd
understand locking the specific agent if the purpose is
thread-safety, but why lock ALL agents at the same time for
each packet you send? I tried searching the archives for
some info on that, but couldn't find any discussion related
to this choice.
> > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since I'm already handling thread-safety for most stuff in
my own code, is there any way for preventing this from
happening? I guess that one possible way could be to
retrieve the file descriptor from the agent object and send
the packet manually myself, but while this may work in
simple (host/prflx/srflx) connectivity, this would
definitely not be the case when, for instance, TURN is
involved, as that's one of the things the library would
take care of for me.
> > > > > > 
> > > > > > Looking forward to your thoughts, thanks!
> > > > > > Lorenzo
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > 
> > > > > > nice mailing list
> > > > > > 
> > > > > > nice at lists.freedesktop.org
> > > > > > 
> > > > > > https://lists.freedesktop.org/mailman/listinfo/nice
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > -- 
> > > > > Miguel París Díaz
> > > > > > > > > > -----------------------------------------------------------
-------------
> > > > > Computer/Software engineer.
> > > > > Researcher and architect in http://www.kurento.org
> > > > > http://twitter.com/mparisdiaz
> > > > > > > > > > -----------------------------------------------------------
-------------
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > nice mailing list
> > > > nice at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/nice
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 
-- 
Olivier Crête
olivier.crete at collabora.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nice/attachments/20161013/806c5976/attachment.html>


More information about the nice mailing list