[libnice] sometimes connectivity checks fail...

Philip Withnall philip at tecnocode.co.uk
Sat Jan 16 01:37:59 PST 2016


The lines which indicate that some of the remote candidates are on UDP
port 9 are a bit suspicious, since that port is reserved:

Jan 13 11:50:29 [NICE_DEBUG] Agent 0x4a0014e0 : Adding UDP remote
candidate with addr [[Remote public IP address]]:9 for s1/c1. U/P
'(null)'/'(null)' prio: 1019216127

However, I don’t think that’s the problem, since the agents see other
remote candidates at more plausible-looking ports (30000 and above).

Could you provide Wireshark logs from both ends of the connection
please? Please do not redact them, since that will just confuse
matters. If you want to keep them off the list, you could e-mail them
to my address privately.

Philip

On Wed, 2016-01-13 at 14:38 +0800, Jack Wang wrote:
> Hi Philip,
> 
> Below link is the debug log which only includes from nice_debug and
> stun_debug,
> It's from the caller and started from call-setup , ended in the
> middle of the call.
> The result is connectivity checks failed in every channel,
> you may see lots of similar logs since I ran total four ICE threads
> for various kinds of media in the same time.
> 
> For privacy reason, I replace some IP address strings with the
> following ones in log file,
> They are:
> [STUN Server Address]
> [Remote public IP address]
> [NAT public IP address]
> 
> 
> log link:
> https://docs.google.com/document/d/1wIduDrGqj7jKSb8q6K6eECNv0xW-0miW0
> D6UZriG2SQ/pub
> 
> 
> Thanks for your help.
> 
> 
> 
> 2016-01-12 17:59 GMT+08:00 Philip Withnall <philip at tecnocode.co.uk>:
> > Hi,
> > 
> > On Tue, 2016-01-12 at 16:33 +0800, Jack Wang wrote:
> > > After I tested with the latest libnice from master branch,
> > > I found that when the caller (Offer) begin negotiation with the
> > > callee (Answer),
> > > the caller program then crash due to assertion fail from libnice.
> > >
> > > And I found the assertion code is add from the commit below:
> > > http://cgit.freedesktop.org/libnice/libnice/commit/?id=2eaa8b3277
> > f4f3
> > > 9515ff5dc7b512a44fd79e7275
> > >
> > > For understanding what the old state and new state are at that
> > time,
> > > I printed them and got  
> > >
> > > old-state = FAILED
> > > new-state = CONNECTED
> > >
> > > all the assertion failed so the application terminated.
> > >
> > > However, if I do ICE with only audio or video channel, that is,
> > with
> > > only one ICE thread, it works~!
> > > so maybe my threads have some improper codes that cause
> > interference
> > > with the others,
> > > I'll keep tracking!
> > 
> > I can’t really debug that without a libnice log or access to your
> > code
> > (preferred). It could be a bug in libnice, but I wouldn’t be able
> > to
> > fix it without either of the above.
> > 
> > > And now I haven't used the relay candidate yet,
> > > after current version of the application becomes more stable,
> > > I'll also test with it.
> > >
> > > I think ICE can deal with the case that both endpoint are in
> > > different symmetric NATs , doesn't it?
> > 
> > Yes, ICE’s solution for symmetric NATs is TURN. At a high level,
> > ICE is
> > STUN plus TURN, and what you’re using at the moment is just the
> > STUN
> > part. You need to set a relay server in order to use TURN. So for
> > some
> > network configurations, I would expect the connection to fail.
> > 
> > > Furthermore, 
> > > I want to print the debug logs in syslog file , not on the
> > terminal
> > > screen.
> > > below are the steps, but doesn't work.
> > 
> > I would just redirect the output of your program:
> > 
> > my-program-name &> /path/to/log/file.log
> > 
> > If you want to log everything from your program (including libnice
> > debug messages) to the syslog, you should install a custom GLib log
> > handler using g_log_set_default_handler() or g_log_set_handler().
> > 
> > > 1. modify the nice_debug() function
> > 
> > If you start modifying libnice, you are going to run into
> > maintainability problems for your software later on, as you will
> > end up
> > having to port your changes to each new version of libnice, unless
> > you
> > get them reviewed and committed upstream. :-)
> > 
> > Philip
> > 
> > > 2016-01-11 17:29 GMT+08:00 Philip Withnall <philip at tecnocode.co.u
> > k>:
> > > > You can do that without source code modifications by passing --
> > > > enable-
> > > > compile-warnings=maximum to the configure script. The default
> > is --
> > > > enable-compile-warnings=error, which enables -Werror.
> > > >
> > > > Philip
> > > >
> > > > On Mon, 2016-01-11 at 16:30 +0800, Jack Wang wrote:
> > > > > Well, after I remove -Werror and -Wno-suggest-
> > attribute=format
> > > > from
> > > > > LIBNICE_CFLAGS,
> > > > > `make` works!
> > > > >
> > > > > Later I'll report the result back. :P
> > > > >
> > > > > 2016-01-11 15:24 GMT+08:00 Jack Wang <antirazin at gmail.com>:
> > > > > > Hello Philip,
> > > > > >
> > > > > > When I try to do `make` after I configured master version
> > of
> > > > > > libnice,
> > > > > > error occurred:
> > > > > >
> > > > > > [jack at localhost libnice]$ make
> > > > > > make  all-recursive
> > > > > > make[1]: Entering directory `/home/jack/Desktop/libnice'
> > > > > > Making all in stun
> > > > > > make[2]: Entering directory
> > `/home/jack/Desktop/libnice/stun'
> > > > > > Making all in .
> > > > > > make[3]: Entering directory
> > `/home/jack/Desktop/libnice/stun'
> > > > > >   CC     stunagent.lo
> > > > > >   CC     stunmessage.lo
> > > > > > stunmessage.c: In function 'stun_message_append_addr':
> > > > > > stunmessage.c:437:41: error: cast increases required
> > alignment
> > > > of
> > > > > > target type [-Werror=cast-align]
> > > > > > stunmessage.c:447:42: error: cast increases required
> > alignment
> > > > of
> > > > > > target type [-Werror=cast-align]
> > > > > > stunmessage.c: At top level:
> > > > > > cc1: error: unrecognized command line option "-Wno-suggest-
> > > > > > attribute=format" [-Werror]
> > > > > > cc1: all warnings being treated as errors
> > > > > >
> > > > > > make[3]: *** [stunmessage.lo] Error 1
> > > > > > make[3]: Leaving directory
> > `/home/jack/Desktop/libnice/stun'
> > > > > > make[2]: *** [all-recursive] Error 1
> > > > > > make[2]: Leaving directory
> > `/home/jack/Desktop/libnice/stun'
> > > > > > make[1]: *** [all-recursive] Error 1
> > > > > > make[1]: Leaving directory `/home/jack/Desktop/libnice'
> > > > > > make: *** [all] Error 2
> > > > > >
> > > > > > however, it never occurred in 0.1.13,
> > > > > > any suggestion for this?? 
> > > > > > btw,the gcc used is ARM structure
> > > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > 2016-01-11 5:21 GMT+08:00 Philip Withnall <philip at tecnocode
> > .co.
> > > > uk>:
> > > > > > > Hi,
> > > > > > >
> > > > > > > It seems like you have several problems here.
> > > > > > >
> > > > > > > On Fri, 2016-01-08 at 14:14 +0800, Jack Wang wrote:
> > > > > > > > I have to print debug logs in syslog,
> > > > > > > > can you teach me how to achieve this?
> > > > > > >
> > > > > > > In your terminal:
> > > > > > >
> > > > > > > export G_MESSAGES_DEBUG=all
> > > > > > > export NICE_DEBUG=all
> > > > > > >
> > > > > > > then run your program. This will print the full libnice
> > debug
> > > > > > > logs to
> > > > > > > the terminal.
> > > > > > >
> > > > > > > > In a normal way, the state flow should be gathering ->
> > > > > > > connecting ->
> > > > > > > > connected -> ready,
> > > > > > > > sometimes may be gathering -> connecting -> failed ->
> > > > connected
> > > > > > > ->
> > > > > > > > ready,
> > > > > > > > however, it also can be gathering -> connecting ->
> > failed, 
> > > > > > > > which will never be changed to connected state :(
> > > > > > > >
> > > > > > > > I use the callback like the one in sample code (ex:
> > sdp-
> > > > > > > example.c),
> > > > > > > > when the state changed,
> > > > > > > > libnice will signal the callback so that I can know the
> > > > state
> > > > > > > in my
> > > > > > > > application.
> > > > > > > >
> > > > > > > > I used version of 0.1.13,
> > > > > > > > and I will try the master later to see what happened . 
> > > > > > >
> > > > > > > I would suggest trying with master. There have been a
> > couple
> > > > of
> > > > > > > fixes
> > > > > > > since 0.1.13 to do with state handling and signalling.
> > > > > > >
> > > > > > > > I'm also wondering if the bug is related to network
> > > > > > > environment.
> > > > > > > > If the two ICE endpoints were at the same LAN, the
> > > > connectivity
> > > > > > > > checks never fails. 
> > > > > > > > (well.... actually I can't promise this is always
> > right,
> > > > the
> > > > > > > reason
> > > > > > > > why I suppose this because I called over 30 times and
> > it's
> > > > > > > always OK)
> > > > > > > > But it failed more frequent (below 10 times or less)
> > when
> > > > two
> > > > > > > > endpoints were at different network areas.
> > > > > > >
> > > > > > > Almost everything to do with libnice behavioural
> > differences
> > > > is
> > > > > > > to do
> > > > > > > with network environment! Note that ICE negotiation is
> > not
> > > > > > > guaranteed
> > > > > > > to succeed in some network environments (for example,
> > between
> > > > two
> > > > > > > peers
> > > > > > > which are each behind a symmetric NAT).
> > > > > > >
> > > > > > > Do you have a TURN relay set up?
> > > > > > >
> > > > > > > > Btw, I use an array , which is always reused in next
> > call ,
> > > > to
> > > > > > > store
> > > > > > > > ICE agents for several media channels,
> > > > > > > > so I didn't clear the agent with the g_object_unref in
> > the
> > > > end
> > > > > > > like
> > > > > > > > in examples since I will get an assertion in
> > nice_agent_new
> > > > > > > when I
> > > > > > > > make a new call,
> > > > > > > > I just set the agent to NULL when call hangs up.
> > > > > > > >
> > > > > > > > Is this a proper method? or may cause some side
> > effects?
> > > > > > >
> > > > > > > If you are setting the NiceAgent pointer to NULL without
> > > > calling
> > > > > > > g_object_unref() first, you are leaking the memory from
> > the
> > > > > > > NiceAgent,
> > > > > > > plus all the resources (including network ports) which
> > it’s
> > > > > > > using. This
> > > > > > > might be contributing to the ICE failures you are seeing,
> > if
> > > > > > > there are
> > > > > > > no more forwardable ports left for the new NiceAgent to
> > use.
> > > > > > >
> > > > > > > If you are getting an assertion when calling
> > nice_agent_new()
> > > > > > > after
> > > > > > > unreffing the old instance, that indicates a bug
> > somewhere –
> > > > > > > probably
> > > > > > > somewhere else in your code – which needs investigating.
> > > > > > >
> > > > > > > Philip
> > > > > > >
> > > > > > > > 2016-01-05 6:05 GMT+08:00 Philip Withnall <philip at tecno
> > code
> > > > .co.
> > > > > > > uk>:
> > > > > > > > > Can you please provide a debug log from libnice for
> > this?
> > > > > > > It’s hard
> > > > > > > > > to
> > > > > > > > > work out what the problem is otherwise.
> > > > > > > > >
> > > > > > > > > Does the component state change to
> > > > > > > NICE_COMPONENT_STATE_FAILED? If
> > > > > > > > > you
> > > > > > > > > wait, does it later change to
> > NICE_COMPONENT_STATE_READY
> > > > or
> > > > > > > > > *_CONNECTED? What are you waiting for to know when
> > the
> > > > > > > connection
> > > > > > > > > is
> > > > > > > > > ready?
> > > > > > > > >
> > > > > > > > > What version of libnice is this with? 0.1.13, or
> > master?
> > > > Can
> > > > > > > you
> > > > > > > > > try
> > > > > > > > > with master?
> > > > > > > > >
> > > > > > > > > Philip
> > > > > > > > >
> > > > > > > > > On Thu, 2015-12-24 at 21:40 +0800, Jack Wang wrote:
> > > > > > > > > > I also test by using the random ports , which is
> > used
> > > > > > > originally
> > > > > > > > > in
> > > > > > > > > > libnice,
> > > > > > > > > > and found it also fails sometimes, 
> > > > > > > > > > however,  it still can work in some later calls.
> > > > > > > > > >
> > > > > > > > > > Keep tracking and testing....:P
> > > > > > > > > >
> > > > > > > > > > 2015-12-24 21:20 GMT+08:00 Jack Wang <antirazin at gma
> > il.c
> > > > om>:
> > > > > > > > > > > Hi, everyone
> > > > > > > > > > >
> > > > > > > > > > > For several media channels (ex: audio,video
> > etc.),
> > > > > > > > > > > I create ICE agents for each of them,
> > > > > > > > > > > and each channel I used a fixed port which is a
> > fixed
> > > > RTP
> > > > > > > port.
> > > > > > > > > > >
> > > > > > > > > > > Then after I did a SIP call to exchange the ICE
> > SDP
> > > > with
> > > > > > > the
> > > > > > > > > > > callee,
> > > > > > > > > > > I found the one who sent the offer often failed
> > on
> > > > > > > negotiation
> > > > > > > > > on
> > > > > > > > > > > some channels (not the same ones every time), 
> > > > > > > > > > > while the answer one is always OK.
> > > > > > > > > > > And if failed on the first time, it will always
> > fail
> > > > in
> > > > > > > the
> > > > > > > > > > > following calls.
> > > > > > > > > > >
> > > > > > > > > > > The Offer one is behind a symmetric NAT, and the
> > > > Answer
> > > > > > > one is
> > > > > > > > > on
> > > > > > > > > > > WAN.
> > > > > > > > > > > I trace the log and found the failed(for
> > negotiation)
> > > > > > > ones
> > > > > > > > > always
> > > > > > > > > > > discover the prflx candidate very late, and
> > cannot be
> > > > > > > READY
> > > > > > > > > state
> > > > > > > > > > > in the end.
> > > > > > > > > > >
> > > > > > > > > > > I cannot figure out why this happens,
> > > > > > > > > > > does it is related to the NAT policy for port
> > > > > > > forwarding??
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks in advance :)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > nice mailing list
> > > > > > > > > > nice at lists.freedesktop.org
> > > > > > > > > > http://lists.freedesktop.org/mailman/listinfo/nice
> > > > > > > > > _______________________________________________
> > > > > > > > > nice mailing list
> > > > > > > > > nice at lists.freedesktop.org
> > > > > > > > > http://lists.freedesktop.org/mailman/listinfo/nice
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/nice/attachments/20160116/3a6aa945/attachment.sig>


More information about the nice mailing list