[libnice] Sending packets uses a static, global lock?

Olivier Crête olivier.crete at collabora.com
Fri Nov 4 13:39:36 UTC 2016


Hi,

The last few weeks have been stupidly busy. That bug (T93) is about
having the same lock for sending and receiving.. I'd open a new one
about the global lock. The make the lock per agent and pass the agent
reference in a weak ref to all of the timeout sources, that should fix
the issues we had earlier.

Olivier


On Fri, 2016-11-04 at 13:12 +0100, fb111 at web.de wrote:
> 
>  
> 
> 
> Hi Olivier,
> 
> 
>  
> 
> 
> > have you had the chance to look into this problem? We are also
prepared to help, but would appreciate your advise.
> 
> 
>  
> 
> 
> Best,
> 
> 
>  
> 
> 
> Frank
> 
> 
>  
> 
> 
>  
> 
> Gesendet: Donnerstag, 13. Oktober 2016 um 20:27 Uhr
> 
> > Von: "Lorenzo Miniero" <lminiero at gmail.com>
> 
> > An: "Olivier Crête" <olivier.crete at collabora.com>
> 
> > > > Cc: "Anand Sivaram" <aspnair at gmail.com>, "Miguel París Díaz" <mparisd
iaz at gmail.com>, nice <nice at lists.freedesktop.org>
> 
> Betreff: Re: [libnice] Sending packets uses a static, global lock?
> 
> 
> 
> 
> 
> > 2016-10-13 16:44 GMT+02:00 Olivier Crête <olivier.crete at collabora.com
>:
> 
> > 
> > Hi,
> > 
> > 
> >  
> > 
> > 
> > > > > > > > > > > > > > > > The lock should just be per agent. The global lock was a workaround
for the crash that happens when the timeout GSource fires off while
the object is being destroyed and then has an invalid agent pointer
in the callback. Possibly we just need to use a GWeakRef for the
agent in all of the callbacks, possibly by doing a wrapper
using agent_timeout_add_with_context(). And then we can also remove
the "if (g_source_is_destroyed (g_main_current_source ())) {"
everywhere and centralize it all in the wrapper...
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> 
>  
> 
> 
>  
> 
> 
> Hi Olivier,
> 
> 
>  
> 
> 
> > > > > thanks for the clarification. From what I understood of how sources
work in glib, this sounds like a reasonable plan. Do you want me to
open an issue somewhere to keep track of these activities? I noticed
there seems to be already one that looks related, but I'm not sure
what its current state it:
> 
> 
>  
> 
> 
> https://phabricator.freedesktop.org/T93
> 
> 
>  
> 
> 
> > > If there's anything I can do to help, just let me know. I'll be
abroad next week for a conference, but when I come back I'm willing
to work on the code to help fix this.
> 
> 
>  
> 
> 
> Thanks,
> 
> 
> Lorenzo
> 
> 
>  
> 
> 
>  
> 
> 
> > 
> >  
> > 
> > 
> > Olivier
> > 
> > 
> > 
> > 
> >  
> > 
> > 
> > On Thu, 2016-10-13 at 11:10 +0200, Lorenzo Miniero wrote:
> > 
> > 
> > 
> > 
> > 
> > > > 2016-10-13 6:46 GMT+02:00 Anand Sivaram <aspnair at gmail.com>:
> > 
> > 
> > 
> > What about having a global flag for the lock type?
> > 
> > Set it with g_object_set during the library initialization.
> > 
> > 
> > It should have 3 values - Off, Per Agent, Global.
> > 
> > 
> > Modify agent_lock() and agent_unlock() accordingly.
> > 
> >  
> > 
> > 
> > > > That way if the applications using libnice implemented their own
locking
> > 
> > 
> > mechanism, that could be used seamlessly.
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > 
> > > > > > > > > > > > > > That looks like a smart approach, especially as it would be
flexible: the off part is particularly interesting, as I have locks
all over the place myself! :-). I'm still concerned about whether
or not this can result in issues with the glib2 stuff libnice uses
internally, though, and that applications don't have access to?
There may be private methods invoked that need that locking,
something the application from outside may not be able to protect.
> > 
> > 
> >  
> > 
> > 
> > Lorenzo
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> >  
> > 
> > 
> > anand
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> >  
> > > > > > On 12 October 2016 at 21:35, Lorenzo Miniero <lminiero at gmail.com>
wrote:
> > 
> > 
> > > > Thanks for the context! (and sorry for the delay in this
response...)
> >  
> > 
> > 
> > > > > > > > > > > > > > > > > > > > > > > > I'm not familiar enough with glib2, unfortunately: considering the
patch dates back to about 7 years ago, is the problem the global
muxed fixed still an issue with current versions? We did a quick
and dirty test by turning those global agent locks to per-agent
locks, where possible (that is, in methods that actually provided a
pointer to an agent instance) and we did notice improvements in the
performance (about twice as many sessions before it started to
break down), but of course this means little if it risks causing
those race conditions and crashes again. Is this road worth
exploring? I'm willing to help and contribute to such work, of
course, although I admit I'm not an expert on the libnice internals
(but it's never too late to learn! ;-) ).
> > 
> > 
> >  
> > 
> > 
> > Thanks,
> > 
> > 
> > Lorenzo
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> >  
> > > > > > 2016-10-05 13:09 GMT+02:00 Miguel París Díaz <mparisdiaz at gmail.com>
:
> > 
> > 
> > Sorry for the copy-paste error.
> > 
> > 
> > 
> >  
> >  
> > > > > > 2016-10-05 13:06 GMT+02:00 Miguel París Díaz <mparisdiaz at gmail.com>
:
> > 
> > 
> > 
> > 
> > 
> > Good point Lorenzo!!!
> > 
> >  
> > 
> > This change was added in this commit:
> > 
> > 
> > 
> > commit 2faa76a4840100bebd9532bd47a2755874c5b3a0
> > 
> > > > Author: Youness Alaoui <youness.alaoui at collabora.co.uk>
> > 
> > Date:   Tue Jun 16 20:40:33 2009 -0400
> > 
> > 
> > 
> > > > > >     Use a global mutex for all nice agents and use
g_source_is_destroyed to avoid the race condition of a source being
destroyed while we wait for the mutex
> > 
> >  
> > 
> > And then the RecMutex was changed by a Mutex in:
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > commit 1deee69325284c726c3a8380a7d5839a51e20c48
> > 
> > > > Author: Olivier Crête <olivier.crete at collabora.com>
> > 
> > Date:   Mon Feb 24 18:51:31 2014 -0500
> > 
> > 
> > 
> >     agent: Replace recursive mutex with non-recursive
> > 
> >    
> > 
> >     This should prevent us from re-adding re-entrancy in the future
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> >  
> > 
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > 
> > > > 2016-10-05 11:45 GMT+02:00 Lorenzo Miniero <lminiero at gmail.com>:
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Hi all,
> >  
> > 
> > 
> > > > > > > > > > > > we've been working on some stress tests of our Janus WebRTC server,
which uses libnice for all the ICE related stuff. While it works
great in general, we reach a wall much sooner than we expect, and
performance start to degrade quite a bit. By doing some digging and
diagnostic in the code we wrote and the one we use, we found out
something we thought was weird.
> > 
> > 
> >  
> > 
> > 
> > > > > > > > > > > > > > > > > > > > > > > > Specifically, we noticed that the internal method that is used to
send packets, nice_agent_send_messages_nonblocking (which
nice_agent_send itself uses internally) has a lot of code wrapped
in an agent_lock()/agent_unlock(), which locks/unlocks a global
mutex, and this includes the process of physiclly sending the
packet. This means that the whole library is locked until that code
is called, which seems overkill. I guess the same happens in other
parts of the code as well. I'd understand locking the specific
agent if the purpose is thread-safety, but why lock ALL agents at
the same time for each packet you send? I tried searching the
archives for some info on that, but couldn't find any discussion
related to this choice.
> > 
> > 
> >  
> > 
> > 
> > > > > > > > > > > > > > Since I'm already handling thread-safety for most stuff in my own
code, is there any way for preventing this from happening? I guess
that one possible way could be to retrieve the file descriptor from
the agent object and send the packet manually myself, but while
this may work in simple (host/prflx/srflx) connectivity, this would
definitely not be the case when, for instance, TURN is involved, as
that's one of the things the library would take care of for me.
> > 
> > 
> >  
> > 
> > 
> > Looking forward to your thoughts, thanks!
> > 
> > Lorenzo
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > 
> > nice mailing list
> > 
> > nice at lists.freedesktop.org
> > 
> > https://lists.freedesktop.org/mailman/listinfo/nice
> > 
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > 
> > Miguel París Díaz
> > 
> > > > -----------------------------------------------------------------
-------
> > 
> > Computer/Software engineer.
> > 
> > Researcher and architect in http://www.kurento.org
> > 
> > http://twitter.com/mparisdiaz
> > 
> > > > -----------------------------------------------------------------
-------
> > 
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >  
> > _______________________________________________
> > nice mailing list
> > nice at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/nice
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >  
> > 
> > -- 
> > > > Olivier Crête olivier.crete at collabora.com 
> > 
> > 
> > 
> > 
> 
> 
> 
> 
> 
> 
> > > _______________________________________________ nice mailing list
nice at lists.freedesktop.org https://lists.freedesktop.org/mailman/list
info/nice
> 
> 
> 
> 
> 
> 
> 
> 
-- 
Olivier Crête
olivier.crete at collabora.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nice/attachments/20161104/1f1765b5/attachment-0001.html>


More information about the nice mailing list