[libnice] Unreliable negotiations via TURN server and general conncheck timing questions

Fri Aug 10 16:34:32 UTC 2018

Hi everyone,

I've been trying to implement libnice for my P2P application, but am
running into some complications (and have a few questions) that I can't
seem to find answers to.

The main problem I have right now is that scenarios where a TURN server is
required (due to problematic NATs) have a low success rate (maybe like 15%).
For a simplicity's sake, I modified the simple-example.c program that comes
with libnice to TURN server details (I'm using Twilio's TURN service for
this), and both sides generate relay candidates.  I then copy/paste each
other's credential/candidates as per the example instructions, then hit
enter on both to begin negotiation.  In this example, sometimes a
connection is established (~15%), most of the time it does not.  Timing
seems to be a factor, where if I press enter on one terminal a certain
amount of time after the other, it seems to have a higher chance of
successfully negotiating.  I introduced a signaling server to be able to
send candidates to their peers (who then processes them immediately) as
soon as they are discovered, as well as automatically sending the full
candidates list after candidate gathering is complete.  This seemed to
yield even worse results.  Should this be sufficient to reliably establish
a connection via a TURN server?  Or am I missing some steps here?

Looking at the debug messages, I notice that it took some time to receive
createpermissions back from the TURN servers.  If I understand this
correctly, peer1 receives remote candidates from peer2, then peer1 will
send these addresses to the TURN server to tell it to allow messages from
these addresses through to peer1.  Is it possible that the peer2's are
sending stun requests to the TURN servers before the permissions were
created, thus the messages are discarded by the TURN server instead of
forwarded to peer1?  Looking through the code, I couldn't find any way to
attach a callback after createpermissions were made, or to tell a peer to
delay sending stun requests to relay candidates.

As a more general question, I noticed that one peer would usually finish
their connchecks faster than the other (which may contribute to the problem
detailed above).  Is there a way to control this?  I can see this posing a
problem for other scenarios as well, such as when one peer is behind a
port-restricted NAT.  If I understand this correctly, the port-restricted
NAT won't allow stun-requests in to the host unless the host has first sent
packets to the remote address.  But if the peer runs through their
connchecks much faster than the port-restricted NAT user, and before the
port-restricted NAT user can send packets to the peer, then none of the
stun-requests will reach the port-restricted NAT user, correct?  Is there a
solution to cover these scenarios?

Thanks in advance!

Anthony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nice/attachments/20180810/af849a1e/attachment.html>