[libnice] State stuck to 'connected' and no nominated pair

Philip Withnall philip at tecnocode.co.uk
Sat Jan 31 10:23:45 PST 2015


Hi Lorenzo,

Thanks for the update, I’m glad you’ve solved your issue. Hopefully
these instructions will be helpful to others using Amazon EC2 in future.

Philip

On Mon, 2015-01-26 at 19:24 +0100, Lorenzo Miniero wrote:
> Hi Philip,
> 
> 
> no problem, I understand you're busy, we all are :-)
> 
> 
> I'm writing to give you an update, as it looks like we may have sorted
> out the issue. As I think I anticipated in one of my previous posts,
> we were deploying our gateway on Amazon, which has a 1-1
> private-public IP mapping. As such, we let libnice do the gathering on
> the available interfaces, which resulted in a private address and
> port, and then just replaced the private IP with the public one when
> advertizing it in the SDP as a host candidate, since the port mapping
> is always the same and there's no need to actually pin-hole a port as
> there's no "regular" NAT involved (a packet sent on a port of the
> public address is always related to the same port on the private
> address, if the port is open in the configuration settings). This
> seems to work nicely in general. Anyway, when we configured our
> gateway to use STUN on the server side as well, that is have the
> server actually do a STUN request to get the public mapping and so
> provide both host and srflx candidates in the SDP, the scenarios
> involving Firefox started working as expected in all cases.
> 
> 
> As such, I really don't know what the exact cause of the issue was,
> and why it worked with Chrome but not Firefox, or why when it worked
> it did for the first stream but not the others. The only fact I can
> provide is that this way it works. This does require a bit of an
> additional (theoretically unrequired) burden on the server side, as
> the gateway now needs to do a STUN transaction for every stream of
> every new call, but as long as it works it's ok. I guess this can be
> blamed on the unusual topology Amazon makes use of.
> 
> 
> Thanks for your time,
> Lorenzo
> 
> 2015-01-22 16:31 GMT+01:00 Philip Withnall <philip at tecnocode.co.uk>:
>         Hi Lorenzo,
>         
>         Sorry for the slow reply. I did take a look at the trace a
>         little while
>         ago, but I couldn’t figure out what was going on. I’m afraid
>         the best
>         advice I can give is for you to use gdb to trace through
>         libnice’s
>         handling of the candidate pairs, and work out why it’s
>         nominating pairs
>         but then never selecting them.
>         
>         One other approach you can try is to examine the packet logs
>         in
>         Wireshark, and make sure that all the transmitted STUN checks
>         get the
>         responses you expect them to, given the network configuration.
>         
>         Sorry I can’t be more helpful, but this is a difficult problem
>         to fix
>         without being able to reproduce it!
>         
>         Philip
>         
>         On Fri, 2015-01-09 at 11:44 +0100, Lorenzo Miniero wrote:
>         > Hi,
>         >
>         >
>         > sorry to bump, but is there any light you can shed on this?
>         I'm still
>         > wondering whether it's a bug or something I did wrong, but
>         if it's the
>         > latter I'd appreciate some hints on what this could be.
>         >
>         >
>         > Thanks!
>         > Lorenzo
>         >
>         >
>         >
>         > 2015-01-02 12:58 GMT+01:00 Lorenzo Miniero
>         <lminiero at gmail.com>:
>         >         As promised, the same debug info on 0.1.8.1 instead.
>         The same
>         >         issue is happening there as well, everything fine
>         with Chrome
>         >         instead:
>         >
>         >
>         >         http://pastebin.com/5CUWpZ5M
>         >
>         >
>         >
>         >         Let me know what else I can do to help you
>         understand what's
>         >         going wrong, and if it may be my fault somewhere.
>         >
>         >
>         >         Thanks,
>         >         Lorenzo
>         >
>         >         2015-01-02 11:53 GMT+01:00 Lorenzo Miniero
>         >         <lminiero at gmail.com>:
>         >                 Hi Philip,
>         >
>         >
>         >                 sorry if this took so long but I've just
>         managed to
>         >                 get back working on the code.
>         >
>         >
>         >                 I've captured a log using the environment
>         variables
>         >                 you asked for using libnice-0.1.4:
>         >
>         >
>         >                 http://pastebin.com/Rv48zi2k
>         >
>         >
>         >
>         >                 Later today I'll do the same with 0.1.8.1: I
>         wanted to
>         >                 start with this since, as far as I know, I
>         need to
>         >                 update the code to get it working with
>         0.1.8.1 due to
>         >                 some deprecated callbacks (or am I
>         mistaken?).
>         >                 Besides, having a view of what's happening
>         in this
>         >                 version as well may help us identify the
>         actual issue
>         >                 beforehand.
>         >
>         >
>         >                 This dump refers to a simple echo test call
>         to our
>         >                 gateway, which negotiates audio, video and
>         data
>         >                 channels (3 streams) with rtcp-muxing (so
>         one
>         >                 component per stream) using Firefox stable.
>         The offer
>         >                 is coming from the browser, the gateway
>         answers. I
>         >                 tried to clean the log as much as possible
>         with
>         >                 respect to the possibly overly verbose
>         output my own
>         >                 gateway adds, and just kept the parts of it
>         that could
>         >                 be relevant to the libnice debugging: the
>         libnice
>         >                 debugging, instead, is all there and
>         unmodified. Just
>         >                 for the sake of completeness, the server
>         hosting the
>         >                 gateway is on Amazon: this means that it has
>         a 1-1 NAT
>         >                 mapping, and so, while the gathering is done
>         on a
>         >                 private address (10.0.0.239) it is then
>         modified when
>         >                 reporting it back to the browser replacing
>         it with its
>         >                 public address counterpart (missing in the
>         log). This
>         >                 works because the 1-1 mapping Amazon does
>         keeps the
>         >                 port mapping as well.
>         >
>         >
>         >                 As you can see from the log, there are
>         multiple checks
>         >                 that succeed (the failed ones are related to
>         the
>         >                 private addresses being trickled by the
>         browser) but
>         >                 no one ever gets nominated. Looking at the
>         log, I can
>         >                 see that apparently some pairs are indeed
>         nominated
>         >                 for each stream:
>         >
>         >
>         >                 (process:4945): libnice-DEBUG: Agent
>         0x7f1ee800b9e0 :
>         >                 marking pair 0x7f1ef4016000 (1:1) as
>         nominated
>         >                 (process:4945): libnice-DEBUG: Agent
>         0x7f1ee800b9e0 :
>         >                 marking pair 0x7f1ef4026110 (2:1) as
>         nominated
>         >
>         >                 (process:4945): libnice-DEBUG: Agent
>         0x7f1ee800b9e0 :
>         >                 marking pair 0x7f1ee80a6e60 (3:1) as
>         nominated
>         >
>         >
>         >                 but for some reason the nominated count is
>         always 0
>         >                 anyway in the timer-tick debug lines. All
>         those pairs
>         >                 are "discovered", and from what I can
>         understand, if
>         >                 we look at 0x7f1ef4016000, for instance,
>         this seems to
>         >                 be happening because libnice receives a
>         connectivity
>         >                 check from the public address of the browser
>         before it
>         >                 is notified of the related trickle
>         candidate: this
>         >                 happens because the browser already notified
>         its
>         >                 private, host trickle candidate which is
>         mapped to the
>         >                 server reflective address it will get from
>         STUN later,
>         >                 and is already sending checks from there;
>         libnice will
>         >                 of course receive it from the related public
>         address
>         >                 and consider it a new candidate it was not
>         aware of.
>         >                 Not sure if what's causing the issue is
>         actually
>         >                 receiving the info on the candidate only
>         later: it may
>         >                 be that, when my application adds the new
>         (already
>         >                 discovered) candidate to libnice, it
>         "resets" its
>         >                 nominated status and there are no following
>         >                 connectivity checks to "renew" this (which
>         is why
>         >                 maybe with Chrome it works instead? Chrome
>         sends CC on
>         >                 a regular basis even during the call), but I
>         may be
>         >                 wrong.
>         >
>         >
>         >                 Anyway, whatever the cause, eventually, I
>         gave up and
>         >                 closed the session in the browser, which
>         destroyed the
>         >                 agent in the gateway.
>         >
>         >
>         >                 Let me know if you need additional info from
>         me, I'll
>         >                 keep you posted on 0.1.8.1 debugging as
>         well.
>         >
>         >
>         >                 Lorenzo
>         >
>         >                 2014-12-27 13:01 GMT+01:00 Lorenzo Miniero
>         >                 <lminiero at gmail.com>:
>         >                         Thanks for the response. I'm
>         currently abroad
>         >                         with no pc so I'll only be able to
>         test this
>         >                         when I get back. I'll keep you
>         posted,
>         >
>         >                         Lorenzo
>         >
>         >                         Il 27/dic/2014 10:59 "Philip
>         Withnall"
>         >                         <philip at tecnocode.co.uk> ha scritto:
>         >
>         >                                 Hi,
>         >
>         >                                 On Tue, 2014-12-23 at 17:49
>         +0100,
>         >                                 Lorenzo Miniero wrote:
>         >
>         >
>         >                                 > I'm encountering some
>         issues using
>         >                                 libnice within my open
>         source
>         >                                 > WebRTC gateway. It mostly
>         works
>         >                                 great, but lately I've had
>         some weird
>         >                                 > issues that I can't seem
>         to be abe
>         >                                 to figure out. First of all,
>         >                                 > apologies if this may have
>         been
>         >                                 addressed in the past
>         already, but I
>         >                                 > couldn't find any related
>         discussion
>         >                                 in the archive. I'm using
>         libnice
>         >                                 > 0.1.4 as it's the one
>         available on
>         >                                 most distributions I tried
>         it on
>         >                                 > (ubuntu, fedora), so not
>         sure if
>         >                                 this may be a bug fixed in
>         the
>         >                                 > meanwhilw, but I looked at
>         the
>         >                                 changesets and I couldn't
>         find
>         >                                 anything
>         >                                 > related there either.
>         >
>         >                                 I would strongly suggest
>         trying to
>         >                                 reproduce this with 0.1.8.1.
>         There
>         >                                 were a number of fixes
>         relating to
>         >                                 connection checking which
>         made it
>         >                                 into 0.1.8.
>         >
>         >                                 > As to what I'm getting,
>         when I'm
>         >                                 using it with Firefox
>         stable, which
>         >                                 > does not bundle media
>         (that is, in
>         >                                 WebRTC terms, there are
>         different
>         >                                 > ICE components for audio
>         and video,
>         >                                 while when bundling instead
>         you
>         >                                 > have a single ICE
>         component for all
>         >                                 of them, RTCP too if you're
>         >                                 > rtcp-muxing), I'm often in
>         a
>         >                                 situation where the ICE
>         state gets to
>         >                                 > 'connected' but it does
>         not move
>         >                                 from there. I never get the
>         >                                 > 'new-selected-pair'
>         callback, which
>         >                                 means that, although there
>         should
>         >                                 > be at least a working pair
>         (the
>         >                                 'connected' suggests so),
>         none of them
>         >                                 > ever gets
>         nominated/selected. I can
>         >                                 confirm, looking at
>         Wireshark,
>         >                                 > that connectivity checks
>         work in
>         >                                 both directions, and in fact
>         from the
>         >                                 > browsers perspective the
>         ICE setup
>         >                                 has been completed
>         (candidate pair
>         >                                 > nominated and selected),
>         but libnice
>         >                                 doesn't seem to think so and
>         so
>         >                                 > nothing works anymore. The
>         state
>         >                                 never gets to 'failed' after
>         that, it
>         >                                 > just stays 'connected'
>         until I have
>         >                                 to assume it's over and I
>         destroy
>         >                                 > the agent.
>         >
>         >                                 It might be fixed by this:
>         >
>          http://cgit.freedesktop.org/libnice/libnice/commit/?id=fcb1b84fd81f7db7dfe25bad824cb7fcfb254469
>         >                                 or one of the other 0.1.8
>         fixes.
>         >
>         >                                 *snip*
>         >                                 > What may be the issue, and
>         in
>         >                                 particular, what may lead to
>         a case
>         >                                 > where no pair is ever
>         nominated no
>         >                                 matter what? Shouldn't the
>         first
>         >                                 > pair that gets you a
>         'connected'
>         >                                 state be selected right
>         away, to be
>         >                                 > replaced by a different
>         pair later
>         >                                 on if its priority is
>         higher, or is
>         >                                 > this not the behaviour to
>         expect?
>         >
>         >                                 If it is still reproducible
>         with
>         >                                 0.1.8, please send us a
>         debug log
>         >                                 gathered with NICE_DEBUG=all
>         >                                 G_MESSAGES_DEBUG=all,
>         otherwise it
>         >                                 will be
>         >                                 impossible to debug the
>         problem.
>         >
>         >                                 Thanks,
>         >                                 Philip
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         
>         
>         
>         _______________________________________________
>         nice mailing list
>         nice at lists.freedesktop.org
>         http://lists.freedesktop.org/mailman/listinfo/nice
>         
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/nice/attachments/20150131/6b81dd64/attachment.sig>


More information about the nice mailing list