[libnice] sometimes connectivity checks fail...

Philip Withnall philip at tecnocode.co.uk
Tue Jan 12 01:59:08 PST 2016


Hi,

On Tue, 2016-01-12 at 16:33 +0800, Jack Wang wrote:
> After I tested with the latest libnice from master branch,
> I found that when the caller (Offer) begin negotiation with the
> callee (Answer),
> the caller program then crash due to assertion fail from libnice.
> 
> And I found the assertion code is add from the commit below:
> http://cgit.freedesktop.org/libnice/libnice/commit/?id=2eaa8b3277f4f3
> 9515ff5dc7b512a44fd79e7275
> 
> For understanding what the old state and new state are at that time,
> I printed them and got  
> 
> old-state = FAILED
> new-state = CONNECTED
> 
> all the assertion failed so the application terminated.
> 
> However, if I do ICE with only audio or video channel, that is, with
> only one ICE thread, it works~!
> so maybe my threads have some improper codes that cause interference
> with the others,
> I'll keep tracking!

I can’t really debug that without a libnice log or access to your code
(preferred). It could be a bug in libnice, but I wouldn’t be able to
fix it without either of the above.

> And now I haven't used the relay candidate yet,
> after current version of the application becomes more stable,
> I'll also test with it.
> 
> I think ICE can deal with the case that both endpoint are in
> different symmetric NATs , doesn't it?

Yes, ICE’s solution for symmetric NATs is TURN. At a high level, ICE is
STUN plus TURN, and what you’re using at the moment is just the STUN
part. You need to set a relay server in order to use TURN. So for some
network configurations, I would expect the connection to fail.

> Furthermore, 
> I want to print the debug logs in syslog file , not on the terminal
> screen.
> below are the steps, but doesn't work.

I would just redirect the output of your program:

my-program-name &> /path/to/log/file.log

If you want to log everything from your program (including libnice
debug messages) to the syslog, you should install a custom GLib log
handler using g_log_set_default_handler() or g_log_set_handler().

> 1. modify the nice_debug() function

If you start modifying libnice, you are going to run into
maintainability problems for your software later on, as you will end up
having to port your changes to each new version of libnice, unless you
get them reviewed and committed upstream. :-)

Philip

> 2016-01-11 17:29 GMT+08:00 Philip Withnall <philip at tecnocode.co.uk>:
> > You can do that without source code modifications by passing --
> > enable-
> > compile-warnings=maximum to the configure script. The default is --
> > enable-compile-warnings=error, which enables -Werror.
> > 
> > Philip
> > 
> > On Mon, 2016-01-11 at 16:30 +0800, Jack Wang wrote:
> > > Well, after I remove -Werror and -Wno-suggest-attribute=format
> > from
> > > LIBNICE_CFLAGS,
> > > `make` works!
> > >
> > > Later I'll report the result back. :P
> > >
> > > 2016-01-11 15:24 GMT+08:00 Jack Wang <antirazin at gmail.com>:
> > > > Hello Philip,
> > > >
> > > > When I try to do `make` after I configured master version of
> > > > libnice,
> > > > error occurred:
> > > >
> > > > [jack at localhost libnice]$ make
> > > > make  all-recursive
> > > > make[1]: Entering directory `/home/jack/Desktop/libnice'
> > > > Making all in stun
> > > > make[2]: Entering directory `/home/jack/Desktop/libnice/stun'
> > > > Making all in .
> > > > make[3]: Entering directory `/home/jack/Desktop/libnice/stun'
> > > >   CC     stunagent.lo
> > > >   CC     stunmessage.lo
> > > > stunmessage.c: In function 'stun_message_append_addr':
> > > > stunmessage.c:437:41: error: cast increases required alignment
> > of
> > > > target type [-Werror=cast-align]
> > > > stunmessage.c:447:42: error: cast increases required alignment
> > of
> > > > target type [-Werror=cast-align]
> > > > stunmessage.c: At top level:
> > > > cc1: error: unrecognized command line option "-Wno-suggest-
> > > > attribute=format" [-Werror]
> > > > cc1: all warnings being treated as errors
> > > >
> > > > make[3]: *** [stunmessage.lo] Error 1
> > > > make[3]: Leaving directory `/home/jack/Desktop/libnice/stun'
> > > > make[2]: *** [all-recursive] Error 1
> > > > make[2]: Leaving directory `/home/jack/Desktop/libnice/stun'
> > > > make[1]: *** [all-recursive] Error 1
> > > > make[1]: Leaving directory `/home/jack/Desktop/libnice'
> > > > make: *** [all] Error 2
> > > >
> > > > however, it never occurred in 0.1.13,
> > > > any suggestion for this?? 
> > > > btw,the gcc used is ARM structure
> > > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > 2016-01-11 5:21 GMT+08:00 Philip Withnall <philip at tecnocode.co.
> > uk>:
> > > > > Hi,
> > > > >
> > > > > It seems like you have several problems here.
> > > > >
> > > > > On Fri, 2016-01-08 at 14:14 +0800, Jack Wang wrote:
> > > > > > I have to print debug logs in syslog,
> > > > > > can you teach me how to achieve this?
> > > > >
> > > > > In your terminal:
> > > > >
> > > > > export G_MESSAGES_DEBUG=all
> > > > > export NICE_DEBUG=all
> > > > >
> > > > > then run your program. This will print the full libnice debug
> > > > > logs to
> > > > > the terminal.
> > > > >
> > > > > > In a normal way, the state flow should be gathering ->
> > > > > connecting ->
> > > > > > connected -> ready,
> > > > > > sometimes may be gathering -> connecting -> failed ->
> > connected
> > > > > ->
> > > > > > ready,
> > > > > > however, it also can be gathering -> connecting -> failed, 
> > > > > > which will never be changed to connected state :(
> > > > > >
> > > > > > I use the callback like the one in sample code (ex: sdp-
> > > > > example.c),
> > > > > > when the state changed,
> > > > > > libnice will signal the callback so that I can know the
> > state
> > > > > in my
> > > > > > application.
> > > > > >
> > > > > > I used version of 0.1.13,
> > > > > > and I will try the master later to see what happened . 
> > > > >
> > > > > I would suggest trying with master. There have been a couple
> > of
> > > > > fixes
> > > > > since 0.1.13 to do with state handling and signalling.
> > > > >
> > > > > > I'm also wondering if the bug is related to network
> > > > > environment.
> > > > > > If the two ICE endpoints were at the same LAN, the
> > connectivity
> > > > > > checks never fails. 
> > > > > > (well.... actually I can't promise this is always right,
> > the
> > > > > reason
> > > > > > why I suppose this because I called over 30 times and it's
> > > > > always OK)
> > > > > > But it failed more frequent (below 10 times or less) when
> > two
> > > > > > endpoints were at different network areas.
> > > > >
> > > > > Almost everything to do with libnice behavioural differences
> > is
> > > > > to do
> > > > > with network environment! Note that ICE negotiation is not
> > > > > guaranteed
> > > > > to succeed in some network environments (for example, between
> > two
> > > > > peers
> > > > > which are each behind a symmetric NAT).
> > > > >
> > > > > Do you have a TURN relay set up?
> > > > >
> > > > > > Btw, I use an array , which is always reused in next call ,
> > to
> > > > > store
> > > > > > ICE agents for several media channels,
> > > > > > so I didn't clear the agent with the g_object_unref in the
> > end
> > > > > like
> > > > > > in examples since I will get an assertion in nice_agent_new
> > > > > when I
> > > > > > make a new call,
> > > > > > I just set the agent to NULL when call hangs up.
> > > > > >
> > > > > > Is this a proper method? or may cause some side effects?
> > > > >
> > > > > If you are setting the NiceAgent pointer to NULL without
> > calling
> > > > > g_object_unref() first, you are leaking the memory from the
> > > > > NiceAgent,
> > > > > plus all the resources (including network ports) which it’s
> > > > > using. This
> > > > > might be contributing to the ICE failures you are seeing, if
> > > > > there are
> > > > > no more forwardable ports left for the new NiceAgent to use.
> > > > >
> > > > > If you are getting an assertion when calling nice_agent_new()
> > > > > after
> > > > > unreffing the old instance, that indicates a bug somewhere –
> > > > > probably
> > > > > somewhere else in your code – which needs investigating.
> > > > >
> > > > > Philip
> > > > >
> > > > > > 2016-01-05 6:05 GMT+08:00 Philip Withnall <philip at tecnocode
> > .co.
> > > > > uk>:
> > > > > > > Can you please provide a debug log from libnice for this?
> > > > > It’s hard
> > > > > > > to
> > > > > > > work out what the problem is otherwise.
> > > > > > >
> > > > > > > Does the component state change to
> > > > > NICE_COMPONENT_STATE_FAILED? If
> > > > > > > you
> > > > > > > wait, does it later change to NICE_COMPONENT_STATE_READY
> > or
> > > > > > > *_CONNECTED? What are you waiting for to know when the
> > > > > connection
> > > > > > > is
> > > > > > > ready?
> > > > > > >
> > > > > > > What version of libnice is this with? 0.1.13, or master?
> > Can
> > > > > you
> > > > > > > try
> > > > > > > with master?
> > > > > > >
> > > > > > > Philip
> > > > > > >
> > > > > > > On Thu, 2015-12-24 at 21:40 +0800, Jack Wang wrote:
> > > > > > > > I also test by using the random ports , which is used
> > > > > originally
> > > > > > > in
> > > > > > > > libnice,
> > > > > > > > and found it also fails sometimes, 
> > > > > > > > however,  it still can work in some later calls.
> > > > > > > >
> > > > > > > > Keep tracking and testing....:P
> > > > > > > >
> > > > > > > > 2015-12-24 21:20 GMT+08:00 Jack Wang <antirazin at gmail.c
> > om>:
> > > > > > > > > Hi, everyone
> > > > > > > > >
> > > > > > > > > For several media channels (ex: audio,video etc.),
> > > > > > > > > I create ICE agents for each of them,
> > > > > > > > > and each channel I used a fixed port which is a fixed
> > RTP
> > > > > port.
> > > > > > > > >
> > > > > > > > > Then after I did a SIP call to exchange the ICE SDP
> > with
> > > > > the
> > > > > > > > > callee,
> > > > > > > > > I found the one who sent the offer often failed on
> > > > > negotiation
> > > > > > > on
> > > > > > > > > some channels (not the same ones every time), 
> > > > > > > > > while the answer one is always OK.
> > > > > > > > > And if failed on the first time, it will always fail
> > in
> > > > > the
> > > > > > > > > following calls.
> > > > > > > > >
> > > > > > > > > The Offer one is behind a symmetric NAT, and the
> > Answer
> > > > > one is
> > > > > > > on
> > > > > > > > > WAN.
> > > > > > > > > I trace the log and found the failed(for negotiation)
> > > > > ones
> > > > > > > always
> > > > > > > > > discover the prflx candidate very late, and cannot be
> > > > > READY
> > > > > > > state
> > > > > > > > > in the end.
> > > > > > > > >
> > > > > > > > > I cannot figure out why this happens,
> > > > > > > > > does it is related to the NAT policy for port
> > > > > forwarding??
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks in advance :)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > nice mailing list
> > > > > > > > nice at lists.freedesktop.org
> > > > > > > > http://lists.freedesktop.org/mailman/listinfo/nice
> > > > > > > _______________________________________________
> > > > > > > nice mailing list
> > > > > > > nice at lists.freedesktop.org
> > > > > > > http://lists.freedesktop.org/mailman/listinfo/nice
> > > > > > >
> > > > >
> > > >
> > 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/nice/attachments/20160112/ff4f1c04/attachment-0001.sig>


More information about the nice mailing list