[Nice] Measuring NICE STUN performance

Mon Oct 8 06:47:19 PDT 2007

> Clients are not expected to generate a large amount of transactions.
> Also
> proper client-side handling requires some book-keeping that the server-
> side
> does not need, and needs non-predictible transaction IDs to protect
> against
> spoofed STUN responses injection.
> 
> So, sending/receiving STUN packets should actually be pretty fast, but
> initializing and destroying a STUN client transaction is pretty slow
> (it has
> hashing, memory allocation, and quite a bunch of system calls...).

Right, that is why I created a client which does not do any client transaction operation.

> > Thus, I decided to write a very very simple STUN client with a
> Binding
> > Request hardcoded (and initially no attributes at all). I then just
> > modify the transaction ID in place (and not randomly, more like a
> > sequence number) and send the message as often as I need. With a
> forked
> > process I recv the responses but do nothing else than counting them
> to
> > make sure all requests have been answered. And in fact, this client
> has
> > very very little CPU consumption so it seems to be perfect.
> 
> This is not safe. The client UDP socket receive queue might overflow.

Well, I read the packets from the queue as fast as possible. And in fact, my client tells me that all requests have been answered (one incoming packet for every outgoing packet).

> 
> > However, I got a very strange problem. The CPU consumption of the
> STUN
> > server is not at all stable, even though I confirmed that requests
> > arrive at a fairly stable rate (by using tcpdump or even by counting
> > received packets in the STUN server dgram_process() routine).
> 
> I have not tried to stress test the server code, but from personal
> experience,
> tcpdump is very bad at stress testing. Its receive queue tends to
> overflow
> quite fast (especially but not only if DNS lookups are not disabled).

Of course I disabled DNS lookups. And to make sure I did not use tcpdump but directly counted packets in dgram_process() and saw packets coming in at a steady rate.

> > The CPU
> > consumption of the STUN server is not just a little bit unstable, but
> > drops to 0% for seconds, even though I am sending 20,000 Binding
> > Requests per second. Then, for a very short period of time, the CPU
> load
> > jumps to 50 or 80% or so just to drop back to 0%.
> 
> This is really weird. Re-reading the stun binding discovery server
> code, the
> fast path really only does:
> 
> loop:
> recvmsg()
> parse request...
> format response...
> sendmsg()
> goto loop
> 
> without any memory allocation, system call or anything.

Do you directly send the message or do you maintain a queue of some sort?

> Random ideas:
> - Is the CPU time wasted in user or kernel land? the kernel UDP receive
> path
> is not so optimized...

Userland, IF CPU time is spent ... as said, for several seconds I see no CPU consumption at all (99,x % idle)

> - Do recvmsg()/sendmsg() return errors?

Recvmsg() is fine, I will check sendmsg, but as I receive the packets on the client side, I guess there are no errors.

> - Link-layer problems (I tend to favor loopback testing to avoid
> these).

Good idea. I will do loopback just to see what's happening then.

Thanks so far.

Christian