[PATCH] do not call _dbus_warn_check_failed on checks

Tue Nov 14 06:41:46 PST 2006

Daniel Stone wrote:
> libdbus is not, it must be said, a simple library.  What you're talking
> about is not even checking against NULL pointers being passed in (which
> I'm careful to guard against), but what if I pass in a slightly invalid
> message due to a bug on my side?  What if I'm dealing with an incoming
> message and only read half the args before I decide I don't want to know
> any more about it -- because it's a treacherous message trying to poison
> the server -- and throw it away?  It's entirely conceivable that someone
> could assert() on unref because all the args weren't inspected.

That is not conceivable, because it would be stupid. The behavior in 
that case is defined, not undefined. (Specifically it is defined that 
DBusMessageIter does not care if you actually iterate.)

The checks are not for arbitrary things. They are for things that will 
cause a problem.

> Note the 'critical paths' thing here: if I fail to deal with input
> hotplug properly, I take all the reasonable precautions to ensure I
> don't segfault in my _own_ code.  If libdbus takes precautions to ensure
> it doesn't segfault in _its_ code, then that's fantastic.  I know it
> can't extend to all situations, such as an invalid pointer.  But I still
> want it to have a defined 'you screwed up' return value, for non-obvious
> situations like 'the message is invalid for this bus', 'one of your
> arguments is NULL, and this was fine in previous versions but not
> anymore', etc.

OK, so you *are* asking for what I described in my point #4 - a 
windows-style "invalid args" HRESULT. And I'll repeat again as I already 
said, that would be a huge API change. And we already shipped 1.0.

It would also *not* be a return_if_fail check. It would be *defining* 
what a NULL arg does. So it's completely irrelevant to whether checks 
are fatal. You are arguing for an API change, not for a 
results-of-invalid-use-of-API behavior change.

> So, say I somehow fail at adding an input device.  Why should that bring
> down my server in such an unclean way that the next attempt to switch
> VTs or start X again, will result in a completely wedged system (no
> SysRq for you, hope everything was saved and your disk synced)?  Why
> should it bring down a word processor so hard it doesn't have a chance
> to save?

I don't see how failing to add an input device has anything to do with 
anything. That would be (as I said in my point #7) a recoverable error, 
not a programming error.

> If libdbus returns NULL you
> weren't expecting, and you crash, then it's provably your bug.

It provably is not. If the function is defined to never return NULL, you 
should not have to check for it (and are wrong if you do imo). If the 
function is defined to return NULL for out-of-memory and you treat the 
NULL as something else, then *that* is provably your bug.

>  If
> libdbus returns NULL and you catch it, then you can decide exactly what
> to do with this error. 

In most cases in libdbus, a NULL return means out-of-memory. That's it. 
If it's the result of a return-if-fail, it still means out of memory. 
However, when I say a check failing means undefined behavior, I mean it. 
In that case the undefined behavior was to falsely report out-of-memory.

Your code is broken, though, if it treats that return as anything but 
out-of-memory.

If a dbus function can fail in two defined ways, then it won't just 
return NULL. It will have a DBusError indicating the reason for failure.

> Kind of like open returning a negative value,
> rather than assert()ing because the programmer was too stupid to call
> access().

Again, that is a runtime error not a programming error. It is impossible 
to write your app such that a failed open() can't happen. It *is* 
possible (and even not that hard, imo) to write your app such that no 
dbus precondition check will fail.

> I'm not arguing for these checks to be removed entirely -- not even
> remotely.  What I'm suggesting, is that the checks turn from 'your
> entire machine suffers a fiery assert() death', to 'define an error
> return value in the API, use it'.

Great, you want to add an HRESULT to every function, post-1.0. Even if I 
thought it was a good idea, it's pretty academic at this point.

You are welcome to create a "wrapper header" with a bunch of inline 
stuff like this:

#define my_dbus_foobar(x,y,z) \
   do {if (x && y && z)        \
         dbus_foobar(x,y,z);   \
       else                    \
         ;                     \
       } while (0)

> (Think of corner cases where you can't reproduce it normally.  If you
>  can log something useful, maybe even throw an exception and get a
>  backtrace, then the good coder will be able to find out what's wrong
>  despite having no way to reproduce it, and maybe fix it.  assert()
>  gives you no useful information at all, ever, and enforces its own
>  worldview on the app.)

When dbus aborts it prints a backtrace, btw. If you think you can get 
more info without aborting (I don't know how), then feel free to set 
fatal warnings to 0.

> I can guarantee you that this will lead to _loss_ of information.  When
> the X server goes down that hard, nothing will be usefully logged, you
> will just end up losing very badly.  So all people know is that the X
> server came down hard, gdm restarted and then hey, IERR.  As opposed to
> useful logged information about what happened and possibly why.

When dbus aborts, it first prints the same message it would have printed 
if it didn't, and then a backtrace (on glibc systems), and then aborts.

You are not going to print anything better yourself, unless we change 
the API to make passing in NULL have a defined result of returning a 
detectable error, because you will not be able to know that the check 
failed.

If you have a better thing to do than print to stderr, we can add a 
set_log_handler() type of thing in 1.2. Propose the API.

> Awesome, a strawman.  You can check Xorg and KDrive's segfault handlers
> if you like, and note that this is provably untrue.

So whatever you do in response to SEGV is probably right in response to 
a check failing. Add SIGABRT to your handled signals.

> Make your undefined behaviour less obnoxious.  At every single point
> where you have an assert(), wonder what a useful way to signal an error
> back to the _application_ consumer of your _library_ is, such that they
> can signal the user, who should obviously be forced to care about D-Bus
> errors.  Then change your asserts, to returning errors.

That's the whole point I'm making. You are arguing for an API change 
that defines programming errors as detectable errors. This has nothing 
to do with what dbus_return_if_fail does. Any error that gets reported 
will not be what dbus calls a "check"

> If the app is buggy, then the app is buggy, and it'll crash.  If the app
> is careful, with a subtle typo or race somewhere, but the people who
> wrote it were careful enough to check for errors, then they'll decide
> what's appropriate.

*There is no way to check for failed checks* - you can't do it. There is 
nothing in the API to allow it.

>> init_error(&error);
>> dbus_frobate(myptr, &error);
>> if (null_arg_error_set(&error)) {
>>    // well, what are you going to put in here? I sure don't know
>>    // and that's why the error isn't returned
>> }
> 
> No, you write this:
> if (!myptr)
>     return;
> dbus_frobate(myptr, &error);
> if (dbus_error_is_set(&error)) {
>     /* some incredibly obscure check triggered, no-one cares */
>     g_error("loss");

There are no "no one cares" checks in libdbus afaik, if there are then 
file a bug. The checks are all important preconditions to the function 
in question working at all.

If you really check every pointer for NULL before using it, there are 
very few checks for anything other than "ptr != NULL"

>> If you are masochistic enough to want to write that, you are welcome to 
>> write this instead:
>>
>> if (!myptr)
>>   // whatever
>> else
>>   dbus_frobate(myptr);
> 
> Yes, and people do.

Good! So what are you complaining about?

>> If you don't know what's in your pointers you have bigger problems. They 
>> could just as easily be bad pointers instead of NULL, and no check is 
>> going to keep those from crashing.
> 
> No, they're not, but your behaviour when they're obviously bad
> (explicitly initialising all your pointers rather than using
> uninitialised memory), is obnoxious.  But this is not just about NULL
> pointers, surely.

AFAICT it pretty much is. The vast majority of checks are for null 
pointers; there are a few other things, but all are equally programming 
bugs, not recoverable errors.

>> 5. Comparison to "wrapper libraries" like Xlib, GNU libc
>> ===
> 
> Before we start here, let me say that Xlib is a) provably not even close
> to the API you want, b) provably close to the API you have.
> 
> If you don't believe me, look at X's error handling, and the I/O error
> handling (a grossly overloaded term) in particular.

X's error handling is for recoverable errors - it is analogous to 
DBusError. X *has* no equivalent of what dbus calls a "check"

>> However, for most NULL pointers dbus will do a) because of points 3) and 
>> 4) I've already covered. Also, there's no way to pass NULL over the bus, 
>> and many of the pointers you're passing in have no protocol equivalent, 
>> so b) doesn't even make sense.
> 
> Right, so we're down to 'crash', or 'gracefully return an error to the
> callee'.

For a recoverable/avoidable error dbus will give you an error, and for a 
programming error it will at least warn about it and (by default) exit.

>> Remember that return_if_fail() is a subset of a). When libdbus would 
>> have done return_if_fail with a handy warning, glibc and Xlib would 
>> typically just crash or something else bad (like --disable-checks).
> 
> Um, this is not actually the case.  But maybe I just have strange ideas
> about what APIs I use should do.

How is it not the case? You are telling me that nothing in those APIs 
just dereferences the pointer you passed in without checking it? Because 
I think they extensively do so.

> I haven't used the Windows API since I was tooling around with ISAPI
> filters when I was 10.  But this seems like a decent idea to me.  If you
> can check, and it won't provably cost the application in terms of
> performance for you to do so, why not?

1) Because it adds an error return that nobody will check anyway to 
every function in the API 2) printing an explanatory message to stderr 
(and optionally aborting) is more helpful 3) the API is already frozen 
and does not have an error return in every function in the API.

> Um.  assert() brings your entire program down roughly because you had
> something invalid in that struct somewhere.  return NULL brings your
> entire program down gracefully (or allows you to continue, at the app's
> discretion: again, poor apps will be poor apps regardless, whereas good
> apps can work out the criticality and isolation, and continue on if they
> feel it's safe) because you had something invalid in that struct
> somewhere.

Stop mixing up the issues. If there is an error return, there is no 
abort. You can either argue to change the behavior on a failed check, or 
you can argue for an error return. But if there's an error return there 
is no check, so the behavior of the check need not change. If there's no 
error return, then the error return can't be returned to you 
"gracefully" and you can't do anything at your discretion.

In most cases right now the API has no provision to return an error 
code, if the function can't fail or can only fail due to programming error.

> I find it odd that all the functions you're claiming have no way to ever
> possibly return an error under any circumstances known to mankind,
> already have defined returns for having run out of memory.

If a function can only fail due to OOM, the API will usually have only a 
boolean or NULL/not-NULL return. If it can (conceptually) fail for two 
or more recoverable reasons, there will be a DBusError return.

>  Instead of,
> y'know, assert(foo = malloc(...));
> 
> Since you're pretty screwed if malloc() is returning NULL, surely.  (I
> think even the X server, which is pretty careful about these sorts of
> things, inadvertently does five or ten malloc()s in its fatal error
> path.)

The dbus daemon does recover successfully from OOM. I know because there 
is near-100% test coverage of that codepath.

> You say thankful, I say flawed, shortsighted, and irritatingly lazy API
> design.

Well, I say I've designed more APIs than you've used.

> D-Bus conflates the two cases here into assert().

It does not. "check failed" means "bug" - not recoverable error.

> Go read kernel code some day, and then come back and tell me that no-one
> bothers performing error checking, and that every single error ever
> produced by the kernel results in an oops.

Please distinguish recoverable errors and programming errors.

> If I call dbus_message_init() or something, and it asserts on errors,
> rather than returning NULL using the already-defined API to signal an
> error to the calling application, I will be incredibly irritated.

Obviously you are just looking for ways to be incredibly irritated. If 
you don't want to learn something then fine, that's not my problem.

> I honestly don't understand your aversion to signalling errors to the
> app.

We are not talking about errors here, we're talking about bugs. I don't 
write code where every line is an attempt to recover from the previous 
line not working due to my own bug. Such code is not maintainable or 
writable (and nobody writes it - not even the examples on MSDN check all 
the HRESULT for functions that can only fail due to programming error).

If you want to change the API, argue to change the API. Nothing to do 
with what happens on failed checks.

Havoc