[PATCH] do not call _dbus_warn_check_failed on checks

Tue Nov 14 08:14:22 PST 2006

On Tue, Nov 14, 2006 at 09:41:46AM -0500, Havoc Pennington wrote:
> Daniel Stone wrote:
> >Note the 'critical paths' thing here: if I fail to deal with input
> >hotplug properly, I take all the reasonable precautions to ensure I
> >don't segfault in my _own_ code.  If libdbus takes precautions to ensure
> >it doesn't segfault in _its_ code, then that's fantastic.  I know it
> >can't extend to all situations, such as an invalid pointer.  But I still
> >want it to have a defined 'you screwed up' return value, for non-obvious
> >situations like 'the message is invalid for this bus', 'one of your
> >arguments is NULL, and this was fine in previous versions but not
> >anymore', etc.
> 
> OK, so you *are* asking for what I described in my point #4 - a 
> windows-style "invalid args" HRESULT. And I'll repeat again as I already 
> said, that would be a huge API change. And we already shipped 1.0.

Yes, but that doesn't mean it's correct.  So consider this a request for
all functions which currently have fatal checks to be able to return
errors.

> It would also *not* be a return_if_fail check. It would be *defining* 
> what a NULL arg does. So it's completely irrelevant to whether checks 
> are fatal. You are arguing for an API change, not for a 
> results-of-invalid-use-of-API behavior change.

Yes, and you're arguing semantics.  I honestly don't care.

> >So, say I somehow fail at adding an input device.  Why should that bring
> >down my server in such an unclean way that the next attempt to switch
> >VTs or start X again, will result in a completely wedged system (no
> >SysRq for you, hope everything was saved and your disk synced)?  Why
> >should it bring down a word processor so hard it doesn't have a chance
> >to save?
> 
> I don't see how failing to add an input device has anything to do with 
> anything. That would be (as I said in my point #7) a recoverable error, 
> not a programming error.

The D-Bus use in the X server is currently limited to dealing with
requests to add and remove input devices.  Right now, passing in
something which triggers a D-Bus check results in abort() being
triggered for the _entire server_.

This ... is suboptimal.

> >If libdbus returns NULL you
> >weren't expecting, and you crash, then it's provably your bug.
> 
> It provably is not. If the function is defined to never return NULL, you 
> should not have to check for it (and are wrong if you do imo). If the 
> function is defined to return NULL for out-of-memory and you treat the 
> NULL as something else, then *that* is provably your bug.

'your', as in the same 'you'.  Assuming the function is defined to
possibly return NULL.

> > If
> >libdbus returns NULL and you catch it, then you can decide exactly what
> >to do with this error. 
> 
> In most cases in libdbus, a NULL return means out-of-memory. That's it. 
> If it's the result of a return-if-fail, it still means out of memory. 
> However, when I say a check failing means undefined behavior, I mean it. 
> In that case the undefined behavior was to falsely report out-of-memory.

That almost sounds like useful undefined behaviour.  The sort of
undefined behaviour you want to enshrine in your API contract or
documentation or whatever.

> Your code is broken, though, if it treats that return as anything but 
> out-of-memory.
> 
> If a dbus function can fail in two defined ways, then it won't just 
> return NULL. It will have a DBusError indicating the reason for failure.

Okay, so add them.

> >Kind of like open returning a negative value,
> >rather than assert()ing because the programmer was too stupid to call
> >access().
> 
> Again, that is a runtime error not a programming error. It is impossible 
> to write your app such that a failed open() can't happen. It *is* 
> possible (and even not that hard, imo) to write your app such that no 
> dbus precondition check will fail.

Can you guarantee that no-one half-competent will ever slip up such that
a D-Bus assert() will ever trigger?

> >I'm not arguing for these checks to be removed entirely -- not even
> >remotely.  What I'm suggesting, is that the checks turn from 'your
> >entire machine suffers a fiery assert() death', to 'define an error
> >return value in the API, use it'.
> 
> Great, you want to add an HRESULT to every function, post-1.0. Even if I 
> thought it was a good idea, it's pretty academic at this point.

So I don't see why you're getting all pissy, instead of attempting to
stay constructive on the issue?

(If we go back to the strdup thing, it doesn't check _at all_,
 presumably for performance reasons.  It's not that it checks, but
 someone decided that he knew better than all his possible users, to
 decided to implement his own behaviour to outwit them.)

> You are welcome to create a "wrapper header" with a bunch of inline 
> stuff like this:
> 
> #define my_dbus_foobar(x,y,z) \
>   do {if (x && y && z)        \
>         dbus_foobar(x,y,z);   \
>       else                    \
>         ;                     \
>       } while (0)

Yeah, I floated the idea of libebus while having coffee today, so people
don't have to deal with the grotesque pain that is libdbus.  It's
definitely worth doing, with the caveat that doing it admits the API is
horribly broken, and upstream is refusing to be useful about it.  In
that case, I usually just look for other, less broken, projects.

> >(Think of corner cases where you can't reproduce it normally.  If you
> > can log something useful, maybe even throw an exception and get a
> > backtrace, then the good coder will be able to find out what's wrong
> > despite having no way to reproduce it, and maybe fix it.  assert()
> > gives you no useful information at all, ever, and enforces its own
> > worldview on the app.)
> 
> When dbus aborts it prints a backtrace, btw. If you think you can get 
> more info without aborting (I don't know how), then feel free to set 
> fatal warnings to 0.

The callee knows context.  If something fails, I can tell you exactly
why it failed.  Also, backtraces are meaningless with static functions
if you don't disable optimisation.

> >I can guarantee you that this will lead to _loss_ of information.  When
> >the X server goes down that hard, nothing will be usefully logged, you
> >will just end up losing very badly.  So all people know is that the X
> >server came down hard, gdm restarted and then hey, IERR.  As opposed to
> >useful logged information about what happened and possibly why.
> 
> When dbus aborts, it first prints the same message it would have printed 
> if it didn't, and then a backtrace (on glibc systems), and then aborts.
> 
> You are not going to print anything better yourself, unless we change 
> the API to make passing in NULL have a defined result of returning a 
> detectable error, because you will not be able to know that the check 
> failed.
> 
> If you have a better thing to do than print to stderr, we can add a 
> set_log_handler() type of thing in 1.2. Propose the API.

static void set_log_handler((*handler)(DBusError *));
static void set_verbosity(enum { I_DO_NOT_CARE, DO_NOT_EVER_KILL_MY_APP,
BE_CHATTY_IF_YOU_LIKE,
I_AM_A_DBUS_DEVELOPER_AND_ACTUALLY_CARE_ABOUT_ITS_INTERNALS });

> >Make your undefined behaviour less obnoxious.  At every single point
> >where you have an assert(), wonder what a useful way to signal an error
> >back to the _application_ consumer of your _library_ is, such that they
> >can signal the user, who should obviously be forced to care about D-Bus
> >errors.  Then change your asserts, to returning errors.
> 
> That's the whole point I'm making. You are arguing for an API change 
> that defines programming errors as detectable errors. This has nothing 
> to do with what dbus_return_if_fail does. Any error that gets reported 
> will not be what dbus calls a "check"

I don't care what you call a check.

Pick a word for 'evaluating whether an expression tests true as a part
of determining whether to continue execution of the function', let me
know what it is, and we can accept that as axiomatic.

> >If the app is buggy, then the app is buggy, and it'll crash.  If the app
> >is careful, with a subtle typo or race somewhere, but the people who
> >wrote it were careful enough to check for errors, then they'll decide
> >what's appropriate.
> 
> *There is no way to check for failed checks* - you can't do it. There is 
> nothing in the API to allow it.

So fix your API to not be broken by design.

> >>init_error(&error);
> >>dbus_frobate(myptr, &error);
> >>if (null_arg_error_set(&error)) {
> >>   // well, what are you going to put in here? I sure don't know
> >>   // and that's why the error isn't returned
> >>}
> >
> >No, you write this:
> >if (!myptr)
> >    return;
> >dbus_frobate(myptr, &error);
> >if (dbus_error_is_set(&error)) {
> >    /* some incredibly obscure check triggered, no-one cares */
> >    g_error("loss");
> 
> There are no "no one cares" checks in libdbus afaik, if there are then 
> file a bug. The checks are all important preconditions to the function 
> in question working at all.

It depends on your level of care.  I know you wrote D-Bus, and well
done, that's great, but as a user of D-Bus, its internals are not of
interest to me.  I use it as a black box to get messages from A to B,
and have already had to get my hands dirtier than I would like.

> >No, they're not, but your behaviour when they're obviously bad
> >(explicitly initialising all your pointers rather than using
> >uninitialised memory), is obnoxious.  But this is not just about NULL
> >pointers, surely.
> 
> AFAICT it pretty much is. The vast majority of checks are for null 
> pointers; there are a few other things, but all are equally programming 
> bugs, not recoverable errors.

Distinguishing between 'bugs' and 'errors' is mental masturbation.

> >>5. Comparison to "wrapper libraries" like Xlib, GNU libc
> >>===
> >
> >Before we start here, let me say that Xlib is a) provably not even close
> >to the API you want, b) provably close to the API you have.
> >
> >If you don't believe me, look at X's error handling, and the I/O error
> >handling (a grossly overloaded term) in particular.
> 
> X's error handling is for recoverable errors - it is analogous to 
> DBusError. X *has* no equivalent of what dbus calls a "check"

This statement is a flat-out lie.

> >>However, for most NULL pointers dbus will do a) because of points 3) and 
> >>4) I've already covered. Also, there's no way to pass NULL over the bus, 
> >>and many of the pointers you're passing in have no protocol equivalent, 
> >>so b) doesn't even make sense.
> >
> >Right, so we're down to 'crash', or 'gracefully return an error to the
> >callee'.
> 
> For a recoverable/avoidable error dbus will give you an error, and for a 
> programming error it will at least warn about it and (by default) exit.

'exit' being 'crash', warning or no.  Either way, it is _not_
'gracefully return an error to the callee'.

> >>Remember that return_if_fail() is a subset of a). When libdbus would 
> >>have done return_if_fail with a handy warning, glibc and Xlib would 
> >>typically just crash or something else bad (like --disable-checks).
> >
> >Um, this is not actually the case.  But maybe I just have strange ideas
> >about what APIs I use should do.
> 
> How is it not the case? You are telling me that nothing in those APIs 
> just dereferences the pointer you passed in without checking it? Because 
> I think they extensively do so.

I make no blanket statements about every single function in the Xlib
API, but I can tell you that a great deal of them make sanity checks
before you start, and return an appropriate value if the parameters are
somehow incorrect.

> >I haven't used the Windows API since I was tooling around with ISAPI
> >filters when I was 10.  But this seems like a decent idea to me.  If you
> >can check, and it won't provably cost the application in terms of
> >performance for you to do so, why not?
> 
> 1) Because it adds an error return that nobody will check anyway to 
> every function in the API

I don't see the problem here.

> 2) printing an explanatory message to stderr 
> (and optionally aborting) is more helpful

'More helpful' == 'ungracefully terminates the user's session,
potentially causing IERR on restart, documents lost as both libdbus-1
and Xlib race to kill the app due to exit of session bus and server'.

I guess we also have different definitions of 'helpful'.

> 3) the API is already frozen 
> and does not have an error return in every function in the API.

So fix it for later versions, unless you're never planning to release
another version?

> >Um.  assert() brings your entire program down roughly because you had
> >something invalid in that struct somewhere.  return NULL brings your
> >entire program down gracefully (or allows you to continue, at the app's
> >discretion: again, poor apps will be poor apps regardless, whereas good
> >apps can work out the criticality and isolation, and continue on if they
> >feel it's safe) because you had something invalid in that struct
> >somewhere.
> 
> Stop mixing up the issues. If there is an error return, there is no 
> abort. You can either argue to change the behavior on a failed check, or 
> you can argue for an error return. But if there's an error return there 
> is no check, so the behavior of the check need not change. If there's no 
> error return, then the error return can't be returned to you 
> "gracefully" and you can't do anything at your discretion.
> 
> In most cases right now the API has no provision to return an error 
> code, if the function can't fail or can only fail due to programming error.

'No memory' is an error, is it not?  Any function that can return a
pointer has an implicit error return of NULL.  If you want to explicitly
disclaim this possibility, then great, I'm interested in your function
with no possibility of failure.  If you want to explicitly disclaim this
possibility and call abort() from within the _very same function_, then
that's incredibly poor.

> >I find it odd that all the functions you're claiming have no way to ever
> >possibly return an error under any circumstances known to mankind,
> >already have defined returns for having run out of memory.
> 
> If a function can only fail due to OOM, the API will usually have only a 
> boolean or NULL/not-NULL return. If it can (conceptually) fail for two 
> or more recoverable reasons, there will be a DBusError return.

Yes, this is more useful than returning NULL with no further
information, which is in turn more useful than abort().

> > Instead of,
> >y'know, assert(foo = malloc(...));
> >
> >Since you're pretty screwed if malloc() is returning NULL, surely.  (I
> >think even the X server, which is pretty careful about these sorts of
> >things, inadvertently does five or ten malloc()s in its fatal error
> >path.)
> 
> The dbus daemon does recover successfully from OOM. I know because there 
> is near-100% test coverage of that codepath.

This is useful.  However, you also say that apps shouldn't bother with
the low-level bindings, because everyone uses GLib anyway.

I think you can see where I'm going here.

> >You say thankful, I say flawed, shortsighted, and irritatingly lazy API
> >design.
> 
> Well, I say I've designed more APIs than you've used.

Oh, a pissing match.  How original.

> >Go read kernel code some day, and then come back and tell me that no-one
> >bothers performing error checking, and that every single error ever
> >produced by the kernel results in an oops.
> 
> Please distinguish recoverable errors and programming errors.

Programming errors in the kernel also cause BUG_ON() rather than OOPS(),
the former being fatal.

The kernel differentiates between 'the system cannot possibly be allowed
to continue' (panic), 'the callee cannot possibly be allowed to
continue' (oops), and 'dang, shouldn't happen' (warning in dmesg).

When I say 'shouldn't happen', I mean both recoverable and programming
errors, although as my mother running an app, honestly I don't care in
the slightest about the distinction.

> >If I call dbus_message_init() or something, and it asserts on errors,
> >rather than returning NULL using the already-defined API to signal an
> >error to the calling application, I will be incredibly irritated.
> 
> Obviously you are just looking for ways to be incredibly irritated. If 
> you don't want to learn something then fine, that's not my problem.

No, I'm just someone who's been using the low-level D-Bus library, and
attempting to point out what's going on from a D-Bus user's point of
view; the most salient point being that calling abort() is the most
anti-social thing anyone can do, let alone _a library_.

If you don't care about this because you've designed nine million APIs,
then that's good for you, honestly.  But this is something that came up
to me while I was implementing it, and jogged by this thread.

It is something I consider completely unacceptable.  I don't know how
familiar you are with the X server, but adding a dependency on libdbus-1
is not something that could be called a 'conservative' move, so
obviously I'm not wildly pre-set against D-Bus.

> >I honestly don't understand your aversion to signalling errors to the
> >app.
> 
> We are not talking about errors here, we're talking about bugs. I don't 
> write code where every line is an attempt to recover from the previous 
> line not working due to my own bug. Such code is not maintainable or 
> writable (and nobody writes it - not even the examples on MSDN check all 
> the HRESULT for functions that can only fail due to programming error).
> 
> If you want to change the API, argue to change the API. Nothing to do 
> with what happens on failed checks.

Okay:
Change the API.

I don't care about the distinction between checks, tests, bugs, errors,
and whatever.  It's a D-Bus implementation detail.  Repeat: do not care.
If you ever kill the X server because
dbus_connection_send_with_reply_and_block or whatever it is fails for an
invalid parameter, both myself and the users will be annoyed, and your
handwaves won't be too satisfying.

Nor do I care how you fix it.  Really.

It's a shame, because D-Bus the protocol seems quite nice, and in
general things seem to work, but libdbus-1 seems to have taken the worst
lessons from the G* libraries, and the worst lessons from Xlib.

Cheers,
Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.freedesktop.org/archives/dbus/attachments/20061114/e5b60078/attachment-0001.pgp