[libnice] State stuck to 'connected' and no nominated pair

Lorenzo Miniero lminiero at gmail.com
Mon Jan 26 10:24:32 PST 2015


Hi Philip,

no problem, I understand you're busy, we all are :-)

I'm writing to give you an update, as it looks like we may have sorted out
the issue. As I think I anticipated in one of my previous posts, we were
deploying our gateway on Amazon, which has a 1-1 private-public IP mapping.
As such, we let libnice do the gathering on the available interfaces, which
resulted in a private address and port, and then just replaced the private
IP with the public one when advertizing it in the SDP as a host candidate,
since the port mapping is always the same and there's no need to actually
pin-hole a port as there's no "regular" NAT involved (a packet sent on a
port of the public address is always related to the same port on the
private address, if the port is open in the configuration settings). This
seems to work nicely in general. Anyway, when we configured our gateway to
use STUN on the server side as well, that is have the server actually do a
STUN request to get the public mapping and so provide both host and srflx
candidates in the SDP, the scenarios involving Firefox started working as
expected in all cases.

As such, I really don't know what the exact cause of the issue was, and why
it worked with Chrome but not Firefox, or why when it worked it did for the
first stream but not the others. The only fact I can provide is that this
way it works. This does require a bit of an additional (theoretically
unrequired) burden on the server side, as the gateway now needs to do a
STUN transaction for every stream of every new call, but as long as it
works it's ok. I guess this can be blamed on the unusual topology Amazon
makes use of.

Thanks for your time,
Lorenzo

2015-01-22 16:31 GMT+01:00 Philip Withnall <philip at tecnocode.co.uk>:

> Hi Lorenzo,
>
> Sorry for the slow reply. I did take a look at the trace a little while
> ago, but I couldn’t figure out what was going on. I’m afraid the best
> advice I can give is for you to use gdb to trace through libnice’s
> handling of the candidate pairs, and work out why it’s nominating pairs
> but then never selecting them.
>
> One other approach you can try is to examine the packet logs in
> Wireshark, and make sure that all the transmitted STUN checks get the
> responses you expect them to, given the network configuration.
>
> Sorry I can’t be more helpful, but this is a difficult problem to fix
> without being able to reproduce it!
>
> Philip
>
> On Fri, 2015-01-09 at 11:44 +0100, Lorenzo Miniero wrote:
> > Hi,
> >
> >
> > sorry to bump, but is there any light you can shed on this? I'm still
> > wondering whether it's a bug or something I did wrong, but if it's the
> > latter I'd appreciate some hints on what this could be.
> >
> >
> > Thanks!
> > Lorenzo
> >
> >
> >
> > 2015-01-02 12:58 GMT+01:00 Lorenzo Miniero <lminiero at gmail.com>:
> >         As promised, the same debug info on 0.1.8.1 instead. The same
> >         issue is happening there as well, everything fine with Chrome
> >         instead:
> >
> >
> >         http://pastebin.com/5CUWpZ5M
> >
> >
> >
> >         Let me know what else I can do to help you understand what's
> >         going wrong, and if it may be my fault somewhere.
> >
> >
> >         Thanks,
> >         Lorenzo
> >
> >         2015-01-02 11:53 GMT+01:00 Lorenzo Miniero
> >         <lminiero at gmail.com>:
> >                 Hi Philip,
> >
> >
> >                 sorry if this took so long but I've just managed to
> >                 get back working on the code.
> >
> >
> >                 I've captured a log using the environment variables
> >                 you asked for using libnice-0.1.4:
> >
> >
> >                 http://pastebin.com/Rv48zi2k
> >
> >
> >
> >                 Later today I'll do the same with 0.1.8.1: I wanted to
> >                 start with this since, as far as I know, I need to
> >                 update the code to get it working with 0.1.8.1 due to
> >                 some deprecated callbacks (or am I mistaken?).
> >                 Besides, having a view of what's happening in this
> >                 version as well may help us identify the actual issue
> >                 beforehand.
> >
> >
> >                 This dump refers to a simple echo test call to our
> >                 gateway, which negotiates audio, video and data
> >                 channels (3 streams) with rtcp-muxing (so one
> >                 component per stream) using Firefox stable. The offer
> >                 is coming from the browser, the gateway answers. I
> >                 tried to clean the log as much as possible with
> >                 respect to the possibly overly verbose output my own
> >                 gateway adds, and just kept the parts of it that could
> >                 be relevant to the libnice debugging: the libnice
> >                 debugging, instead, is all there and unmodified. Just
> >                 for the sake of completeness, the server hosting the
> >                 gateway is on Amazon: this means that it has a 1-1 NAT
> >                 mapping, and so, while the gathering is done on a
> >                 private address (10.0.0.239) it is then modified when
> >                 reporting it back to the browser replacing it with its
> >                 public address counterpart (missing in the log). This
> >                 works because the 1-1 mapping Amazon does keeps the
> >                 port mapping as well.
> >
> >
> >                 As you can see from the log, there are multiple checks
> >                 that succeed (the failed ones are related to the
> >                 private addresses being trickled by the browser) but
> >                 no one ever gets nominated. Looking at the log, I can
> >                 see that apparently some pairs are indeed nominated
> >                 for each stream:
> >
> >
> >                 (process:4945): libnice-DEBUG: Agent 0x7f1ee800b9e0 :
> >                 marking pair 0x7f1ef4016000 (1:1) as nominated
> >                 (process:4945): libnice-DEBUG: Agent 0x7f1ee800b9e0 :
> >                 marking pair 0x7f1ef4026110 (2:1) as nominated
> >
> >                 (process:4945): libnice-DEBUG: Agent 0x7f1ee800b9e0 :
> >                 marking pair 0x7f1ee80a6e60 (3:1) as nominated
> >
> >
> >                 but for some reason the nominated count is always 0
> >                 anyway in the timer-tick debug lines. All those pairs
> >                 are "discovered", and from what I can understand, if
> >                 we look at 0x7f1ef4016000, for instance, this seems to
> >                 be happening because libnice receives a connectivity
> >                 check from the public address of the browser before it
> >                 is notified of the related trickle candidate: this
> >                 happens because the browser already notified its
> >                 private, host trickle candidate which is mapped to the
> >                 server reflective address it will get from STUN later,
> >                 and is already sending checks from there; libnice will
> >                 of course receive it from the related public address
> >                 and consider it a new candidate it was not aware of.
> >                 Not sure if what's causing the issue is actually
> >                 receiving the info on the candidate only later: it may
> >                 be that, when my application adds the new (already
> >                 discovered) candidate to libnice, it "resets" its
> >                 nominated status and there are no following
> >                 connectivity checks to "renew" this (which is why
> >                 maybe with Chrome it works instead? Chrome sends CC on
> >                 a regular basis even during the call), but I may be
> >                 wrong.
> >
> >
> >                 Anyway, whatever the cause, eventually, I gave up and
> >                 closed the session in the browser, which destroyed the
> >                 agent in the gateway.
> >
> >
> >                 Let me know if you need additional info from me, I'll
> >                 keep you posted on 0.1.8.1 debugging as well.
> >
> >
> >                 Lorenzo
> >
> >                 2014-12-27 13:01 GMT+01:00 Lorenzo Miniero
> >                 <lminiero at gmail.com>:
> >                         Thanks for the response. I'm currently abroad
> >                         with no pc so I'll only be able to test this
> >                         when I get back. I'll keep you posted,
> >
> >                         Lorenzo
> >
> >                         Il 27/dic/2014 10:59 "Philip Withnall"
> >                         <philip at tecnocode.co.uk> ha scritto:
> >
> >                                 Hi,
> >
> >                                 On Tue, 2014-12-23 at 17:49 +0100,
> >                                 Lorenzo Miniero wrote:
> >
> >
> >                                 > I'm encountering some issues using
> >                                 libnice within my open source
> >                                 > WebRTC gateway. It mostly works
> >                                 great, but lately I've had some weird
> >                                 > issues that I can't seem to be abe
> >                                 to figure out. First of all,
> >                                 > apologies if this may have been
> >                                 addressed in the past already, but I
> >                                 > couldn't find any related discussion
> >                                 in the archive. I'm using libnice
> >                                 > 0.1.4 as it's the one available on
> >                                 most distributions I tried it on
> >                                 > (ubuntu, fedora), so not sure if
> >                                 this may be a bug fixed in the
> >                                 > meanwhilw, but I looked at the
> >                                 changesets and I couldn't find
> >                                 anything
> >                                 > related there either.
> >
> >                                 I would strongly suggest trying to
> >                                 reproduce this with 0.1.8.1. There
> >                                 were a number of fixes relating to
> >                                 connection checking which made it
> >                                 into 0.1.8.
> >
> >                                 > As to what I'm getting, when I'm
> >                                 using it with Firefox stable, which
> >                                 > does not bundle media (that is, in
> >                                 WebRTC terms, there are different
> >                                 > ICE components for audio and video,
> >                                 while when bundling instead you
> >                                 > have a single ICE component for all
> >                                 of them, RTCP too if you're
> >                                 > rtcp-muxing), I'm often in a
> >                                 situation where the ICE state gets to
> >                                 > 'connected' but it does not move
> >                                 from there. I never get the
> >                                 > 'new-selected-pair' callback, which
> >                                 means that, although there should
> >                                 > be at least a working pair (the
> >                                 'connected' suggests so), none of them
> >                                 > ever gets nominated/selected. I can
> >                                 confirm, looking at Wireshark,
> >                                 > that connectivity checks work in
> >                                 both directions, and in fact from the
> >                                 > browsers perspective the ICE setup
> >                                 has been completed (candidate pair
> >                                 > nominated and selected), but libnice
> >                                 doesn't seem to think so and so
> >                                 > nothing works anymore. The state
> >                                 never gets to 'failed' after that, it
> >                                 > just stays 'connected' until I have
> >                                 to assume it's over and I destroy
> >                                 > the agent.
> >
> >                                 It might be fixed by this:
> >
> http://cgit.freedesktop.org/libnice/libnice/commit/?id=fcb1b84fd81f7db7dfe25bad824cb7fcfb254469
> >                                 or one of the other 0.1.8 fixes.
> >
> >                                 *snip*
> >                                 > What may be the issue, and in
> >                                 particular, what may lead to a case
> >                                 > where no pair is ever nominated no
> >                                 matter what? Shouldn't the first
> >                                 > pair that gets you a 'connected'
> >                                 state be selected right away, to be
> >                                 > replaced by a different pair later
> >                                 on if its priority is higher, or is
> >                                 > this not the behaviour to expect?
> >
> >                                 If it is still reproducible with
> >                                 0.1.8, please send us a debug log
> >                                 gathered with NICE_DEBUG=all
> >                                 G_MESSAGES_DEBUG=all, otherwise it
> >                                 will be
> >                                 impossible to debug the problem.
> >
> >                                 Thanks,
> >                                 Philip
> >
> >
> >
> >
> >
> >
> >
>
>
> _______________________________________________
> nice mailing list
> nice at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nice
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/nice/attachments/20150126/e14eb3ce/attachment-0001.html>


More information about the nice mailing list