[avahi] Resolving many ( > 200 ) items seems to lead to apparent congestion in the dbus

Wed May 27 00:49:03 PDT 2009

On Wed, 2009-05-27 at 04:12 +0200, Lennart Poettering wrote:
> On Tue, 19.05.09 08:10, Daniel Wynne (daniel.wynne at mobotix.com) wrote:
> 
> > Hi Lennart!
> > 
> > Though its not the standard case, we deal with more than 250 hosts for 
> > testing purpose. We want to resolve all of them, either without spamming 
> > the DBus-System or crash into any Avahi-limitation.
> > First, we tried resolving everything right on the way, but this lead to 
> > apparent DBus-congestion, which could only be solved by restarting the 
> > whole service. After that we tried a queued system which allows only a 
> > few resolvers to be coexistent, but this did not lead to any perceptible 
> > improvement. The DBus still seems to be congested after a short
> > while. 
> 
> D-Bus "congestion"? I don't think that exists. What exactly makes you
> think D-Bus could be 'congested'?

So this was a misunderstanding, sorry. In a previous thread somebody
mentioned that every D-Bus client can have at maximum 250 objects. I
assumed that every resolver browser is a client's object. If this is
correct, the limitation is forced by Avahi and not the D-Bus system.

> > Only a restart of the service solves the problem. Without deep insight I 
> > would assume, that the DBus-objects created are not freed-up cleanly or 
> > lets say immediately. They seem to remain and so the bus gets filled-up.
> > To give you a more detailed overview I explain to you what exactly happens:
> > - We browse the .local domain for HTTP and HTTPS services with two 
> > coexistent browsers
> > - For every resolver we create, we create another one to receive 
> > additional address records
> > - The first resolver is freed immediately after resolving, the second 
> > after the last address has been resolved
> > - This works fins for about 100 Hosts, then it starts to stumble. The 
> > other resolvers achieve timeouts. If a timeout occurs we trie to resolve 
> > again.
> > - After about 1min we receive another circa 50 hosts before the next 
> > stumbling
> > - Then about every 1-2 minutes we receive a variable amount of hosts
> > We also tried a queued algorithm with only a certain amount of 
> > coexistent resolvers without a noticeable improvement.
> 
> This might have to do something with the internal limits Avahi applies
> on almost everything: in this case possibly the size of the chache?
> 
> Also note that if you issue a lot of requests the local IP stack
> packet queueing might already drop packets. Lost packets will most
> likely result in timeouts.

Is there an easy way to verify this? Could not find any proper logfile.

> > As you have a now a litlle more detailed view on what we do, could you 
> > please give us some hints on how to improve the use of Avahi?
> 
> Dunno. This really depends on the problem. I don't have such a setup
> here, so I cannot debug this myself.
> 
> If you get timeouts then either the query or the response packets got
> lost. Try wireshark to find that out. 
> 
> > Maybe its not an DBus but an Avahi issue?
> 
> mDNS/DNS-SD is not a reliable protocol. It wasn't designed to be
> one. Don't make assumptions on its reliabilty that you shouldn't make.
> 
> > What are the exact limits of Avahi concerning cache, timeouts queries, 
> > etc...?
> 
> The max cache size is controlled via AVAHI_CACHE_ENTRIES_MAX in
> avahi-core/cache. It is set to 500. Given the number of hosts this
> might actually be way to low for your use case. Try to increase it to
> 5000.

This solution is not really applicable since it requires recompiling the sources.

> > How does the free-mechanism work exactly, especially on the
> > DBus-side?
> 
> Resolvers/browsers are ref counted by the D-Bus clients. I.e. if two
> clients issue the same queries they end up with the same internal
> browser/resolver. Also, a browser/resolver stub will be kept around
> after the browser resolver died until the next time we'd have to
> reissue a request for it in the hope we might be able to reuse it
> later on for another client. That allows us to resuse the same
> browser/resolver for repeated queries of the same data.
> 
> Lennart

So this means that every created browser resolver remains as a
D-Bus-Object and therefore might be responsible for the congestion
mentioned before.
Is there a way to cleanly free the browser resolvers immediately?

Thanx again and again ! :-)