Fatal warnings in client code

Mon Mar 6 13:16:11 UTC 2023

On Mon, 06 Mar 2023 at 12:20:05 +0000, Thomas Kluyver wrote:
> I helped someone work around VLC crashing when it tried to create a
> D-Bus message with a string that was not UTF-8 (probably loaded from
> a file without checking the encoding). The workaround was to set the
> environment variable DBUS_FATAL_WARNINGS=0 when running VLC.

Any time that DBUS_FATAL_WARNINGS=0 is helpful, it means there is a
programming error in client code, for which the only solution is to fix
some aspect of the client code.

libdbus doesn't use GLib, but its general philosophy for handling
errors is based on the design of GLib: there's a distinction between
recoverable runtime errors which might be caused by some external
situation and can potentially be fixed by rectifying that external
situation (these are usually reported via DBusError), and programming
errors for which the only solution is a code change (these are undefined
behaviour, but as a courtesy to developers they are often reported via
"checks" like _dbus_return_val_if_fail() if they can be detected easily).

> In terms of implementation, the public function
> calls _dbus_return_val_if_fail, which is a macro wrapping
> _dbus_warn_return_if_fail, which calls _dbus_warn_check_failed. The
> comment for that last function mentions 'Calling this means "there is
> a bug"' - should that mean only bugs in dbus itself, or also bugs in
> application code using it?

That means bugs in anything. These "checks" are primarily for detecting
bugs in application code, but libdbus does not have enough context to know
whether the bug it has detected is in application code, libdbus itself,
or a different library in the same process: all it knows is that there
is a bug.

In contexts where the bug that was detected is known (or at least
strongly believed) to be in libdbus, it should be using _dbus_assert,
which is unaffected by DBUS_FATAL_WARNINGS=0 but is normally disabled
in production builds.

C/C++ are not memory-safe languages, so any library in the same process
space can cause memory corruption that triggers a "check" failure.

> I disagree with his reasoning (he says that the sender shouldn't
> validate messages), but it seems drastic for library code to call abort()
> and crash the entire application. Is this the intended behaviour, or has
> the application or the distro misconfigured something somewhere along
> the way?

Attempting to send non-UTF-8 in a D-Bus STRING is a programming error,
because the spec for the D-Bus STRING type says it is a sequence of
Unicode codepoints encoded in UTF-8, without any U+0000. If the caller has
a string that might or might not be valid UTF-8, then they are responsible
for checking whether it's valid before passing it into libdbus.

At the protocol design level, D-Bus APIs that want to accept or return
non-UTF-8 in a message need to use a byte array instead, like the standard
org.freedesktop.DBus.GetConnectionSELinuxSecurityContext method does
<https://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-get-connection-selinux-security-context>.

Other common failure modes for building a D-Bus message include trying to
send BOOLEANs with value greater than 1, STRINGs that are null pointers,
any dangling pointer (use-after-free, memory corruption or similar), and
stack overruns caused by a missing varargs termination.

The result of a programming error is undefined behaviour, after
which literally anything can happen, including a crash or a security
vulnerability. The default build-time configuration for libdbus is to
include "checks" that diagnose these programming errors when it is
straightforward to do so, by causing a deterministic crash that can
be investigated like any other. Typically these "checks" will detect
non-Unicode STRINGs, out-of-range BOOLEANs and null pointers, because
those are easy to detect, but will not detect use-after-free or memory
corruption, because it's not generally possible to detect those reliably
or efficiently.

With DBUS_FATAL_WARNINGS=0 or if "checks" were disabled at build time,
the typical result of attempting to send invalid D-Bus messages is that
message validation will fail in the message bus instead of in the client,
resulting in the client being unceremoniously disconnected.

> The docs for the function in question say that it can return false if
> there's not enough memory, but don't mention any other failure conditions:
> 
> https://dbus.freedesktop.org/doc/api/html/group__DBusMessage.html#ga17491f3b75b3203f6fc47dcc2e3b221b

The general philosophy is that the documentation describes the behaviour
when the API is used correctly. When the API is used incorrectly,
undefined behaviour occurs and there are no longer any guarantees.

In principle the API documentation could say things like:

    DBUS_EXPORT dbus_bool_t dbus_message_iter_append_basic (... etc. ...)

    Appends a basic-typed value to the message.

    ...

    If @type is not a basic type, the behaviour is undefined.

    If @value is NULL or does not point to a valid object of type @type,
    the behaviour is undefined.

    If a DBUS_TYPE_STRING parameter is not valid UTF-8, the behaviour is
    undefined.

    If a DBUS_TYPE_BOOLEAN parameter is neither 0 nor 1, the behaviour is
    undefined.

    If a DBUS_TYPE_SIGNATURE parameter is not a syntactically valid
    signature, the behaviour is undefined.

    Returns: TRUE on success, FALSE if not enough memory, an unspecified
        value if undefined behaviour was caused.

but that would make the API documentation much larger, and would probably
still not be comprehensive.

Strictly speaking, the documentation for dbus_message_iter_append_basic()
doesn't actually say that there aren't other situations where FALSE can
be returned, it only says that OOM causes FALSE to be returned. It also
doesn't explicitly say that TRUE is returned on success (I think that's
probably a bug, merge requests welcome).

    smcv