[patch] break glib bindings by adding timeout_milliseconds

Wed Mar 1 08:37:13 PST 2006

On Wed, 2006-03-01 at 01:50 -0500, David Zeuthen wrote:
> On Wed, 2006-03-01 at 00:31 -0500, Havoc Pennington wrote:
> > Two more things you'd have to change:
> >  - in dbus-pending-call.c the timeout gets clamped to a max of 6 hours. 
> >    The comment there mentions some math that needs fixing for overflow
> >    issues if changing this.
> 
> This should be fixable, yes?
> 

Sure, but I still don't think timeouts between about 5 minutes and
INT64_MAX are useful. Either it needs to time out, or it doesn't...

> >  - the bus has a reply timeout of 5 minutes (configurable in the conf 
> >    file)
> 
> Hmm.. we should be able to change this from 5 minutes to "forever" for
> the system bus at least.. clients pretty much needs to trust apps like
> HAL or NetworkManager on the system bus. Yes?

The purpose of this timeout is the same as the other resource limits -
to prevent a client from forcing excessive resource usage on the bus.
There's a limit on outstanding pending replies the bus will keep track
of for that reason, and each outstanding pending reply has a timeout for
how long the bus will track it. This applies to both system daemons and
apps connected to the system bus (the bus doesn't know the difference).
In other words this timeout is for the benefit of the bus, not the app.

Apps and daemons don't have to trust each other or the bus in this
regard - the timeout in libdbus is for the benefit of the app, to keep
it from hanging if something goes wrong with the network or remote app.

> > So when testing I think you probably just were exceeding the default
> > timeout of 25 seconds or so when suspending, 
> 
> When passing timeout = 49 days to dbus-send it worked great for
> Hibernate which took something like 2 minutes to return (swsusp in the
> upstream kernel is sloow!).. 

That should have been the 6 hour timeout in libdbus and the 5 minute
timeout inside the bus....

> > but not the 5 minutes.
> > Either that or the 5 minute timeout doesn't work, which is another issue
> > if so...
> 
> Right, so I just hibernated my box for 10 minutes and it worked fine:
> 
>  $ dbus-send --system --print-reply \
>              --reply-timeout=2147483648 \
>              --dest=org.freedesktop.Hal \
>              /org/freedesktop/Hal/devices/computer \
>              org.freedesktop.Hal.Device.SystemPowerManagement.Hibernate
>  method return sender=:1.2 -> dest=:1.230
>  uint32 0
> 
> So dbus-send did return as expected even though it took HAL about ten
> minutes to return. So this needs some investigation. I'll try to
> hibernate it overnight too to see about whether the six hour clamp is
> broken too.

Thanks

>  a. Add another sentinel, e.g. make -2 mean forever
> 
>  b. Break protocol, ABI and API of entire dbus and go for 64-bit
>     timeouts
> 
> I don't really like magic numbers such as -1 or -2 so I would prefer b.

b _is_ a magic number solution ;-) INT64_MAX is just as magic as -2,
it's not like you really, really need to wait a zillion seconds instead
of a zillion and 1.

The non-magic-number solution would be to add multiple functions:
 send_with_timeout()
 send()

This might be what I'd do on the glib API level. It seems kinda overkill
on the libdbus level though.

> [1]. Realistically, though, we probably want to go for a. and also add
> symbolic constants
> 
>  #define DBUS_TIMEOUT_DEFAULT -1
>  #define DBUS_TIMEOUT_FOREVER -2
> 

I think this is fine, at least for the lowlevel API. Though 0 might be
better than -2 as Thiago suggested

> or something. Sure, it's more typing but, eh, the -1 in most uses of
> dbus_connection_send_with_reply just seems to magical.

"-1 means default/unset" is a glib/gtk-wide convention at least.

> > Whether it's forever or 49 days, keep in mind that if you have no short
> > timeout you're putting your app in the hands of the remote app; if the
> > remote app never replies, you will be locked up for good. 
> 
>  1. Your remote app is broken if it never replies. This needs to be
>     fixed in the remote app and papering it over with timeouts on the
>     bus side is only going to confuse both users and developers.

What you're saying is fine _if_ the remote app is in the same "security
domain" / trusted. The HAL daemon for example can't have this attitude
toward the apps that are using it, though the apps using the HAL daemon
probably can have this attitude toward HAL.

There's also a robustness issue; while I would personally design a
desktop from scratch to be one big managed code process with threads,
instead of a cluster of processes, if we're going to have the cluster of
processes I guess we should get the one advantage of it (they can't
crash each other)

Also, in principle we want to support desktop apps going to the desktop
session bus via TCP. In that case it really is important to be expecting
network errors.

IPC = painful error handling... c'est la vie.

>  2. Your shiny UI client app would have to pass DBUS_TIMEOUT_FOREVER
>     which is not the default behavior. If said client app doesn't
>     gracefully allow the user to somehow cancel this request to a broken
>     remote app... then by definition the client app is broken too.

Not sure what you're saying here.

> I think it comes down to whether we want to support methods that can
> take minutes, hours, even months (for e.g. Hibernate) to return. If we
> don't we may as well hide the concept of timeouts from the application
> developers completely. Perhaps this is why it was omitted in the glib
> bindings.

I don't think there's a huge point in changing the timeout, no. That's
why there's the -1 thing, and the glib bindings don't have it. In the
end it's mostly a boolean choice of "short" or "forever" that's
interesting.

> The patch I posted simply fixes the glib dbus API to allow callers to
> specify timeouts. This is also useful in weird and bizarre peer-to-peer
> cases where timeouts needs to be controlled. So can I at least commit
> this while we continue to argue whether the bus should have time outs?

There are enough open questions and bugs here, we may as well sort them
out:

 - should we add functions to the glib API instead of using magic values
   (call_with_timeout(), where call() defaults to forever maybe?)
 - infinite timeout requires writing a bit of code to implement it in
   libdbus, and then a way to configure the session bus to support it
   also
 - the overflow issues in the time calculations in dbus-connection.c
 - why does your patch appear to work despite the 6-hour and 5-minute
   limits elsewhere - need to understand this

Breaking the API is kind of painful so we should just do it once if
possible...

Havoc