[avahi] Resolving many ( > 200 ) items seems to lead to apparent congestion in the dbus

Tue May 26 19:12:23 PDT 2009

On Tue, 19.05.09 08:10, Daniel Wynne (daniel.wynne at mobotix.com) wrote:

> Hi Lennart!
> 
> Though its not the standard case, we deal with more than 250 hosts for 
> testing purpose. We want to resolve all of them, either without spamming 
> the DBus-System or crash into any Avahi-limitation.
> First, we tried resolving everything right on the way, but this lead to 
> apparent DBus-congestion, which could only be solved by restarting the 
> whole service. After that we tried a queued system which allows only a 
> few resolvers to be coexistent, but this did not lead to any perceptible 
> improvement. The DBus still seems to be congested after a short
> while. 

D-Bus "congestion"? I don't think that exists. What exactly makes you
think D-Bus could be 'congested'?

> Only a restart of the service solves the problem. Without deep insight I 
> would assume, that the DBus-objects created are not freed-up cleanly or 
> lets say immediately. They seem to remain and so the bus gets filled-up.
> To give you a more detailed overview I explain to you what exactly happens:
> - We browse the .local domain for HTTP and HTTPS services with two 
> coexistent browsers
> - For every resolver we create, we create another one to receive 
> additional address records
> - The first resolver is freed immediately after resolving, the second 
> after the last address has been resolved
> - This works fins for about 100 Hosts, then it starts to stumble. The 
> other resolvers achieve timeouts. If a timeout occurs we trie to resolve 
> again.
> - After about 1min we receive another circa 50 hosts before the next 
> stumbling
> - Then about every 1-2 minutes we receive a variable amount of hosts
> We also tried a queued algorithm with only a certain amount of 
> coexistent resolvers without a noticeable improvement.

This might have to do something with the internal limits Avahi applies
on almost everything: in this case possibly the size of the chache?

Also note that if you issue a lot of requests the local IP stack
packet queueing might already drop packets. Lost packets will most
likely result in timeouts.

> As you have a now a litlle more detailed view on what we do, could you 
> please give us some hints on how to improve the use of Avahi?

Dunno. This really depends on the problem. I don't have such a setup
here, so I cannot debug this myself.

If you get timeouts then either the query or the response packets got
lost. Try wireshark to find that out. 

> Maybe its not an DBus but an Avahi issue?

mDNS/DNS-SD is not a reliable protocol. It wasn't designed to be
one. Don't make assumptions on its reliabilty that you shouldn't make.

> What are the exact limits of Avahi concerning cache, timeouts queries, 
> etc...?

The max cache size is controlled via AVAHI_CACHE_ENTRIES_MAX in
avahi-core/cache. It is set to 500. Given the number of hosts this
might actually be way to low for your use case. Try to increase it to
5000.

> How does the free-mechanism work exactly, especially on the
> DBus-side?

Resolvers/browsers are ref counted by the D-Bus clients. I.e. if two
clients issue the same queries they end up with the same internal
browser/resolver. Also, a browser/resolver stub will be kept around
after the browser resolver died until the next time we'd have to
reissue a request for it in the hope we might be able to reuse it
later on for another client. That allows us to resuse the same
browser/resolver for repeated queries of the same data.

Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/           GnuPG 0x1A015CC4