Floating-point and mixed-endianness in D-Bus (was: dbus mini-summit)

Mon Aug 8 08:59:19 PDT 2011

(Deliberately breaking the thread, this is only tangentially related)

On Mon, 08 Aug 2011 at 10:50:29 -0400, Colin Walters wrote:
> * Talks of DBUS 2.0
>   - What do we need to break
>     - floats

This reminds me of something I'd been intending to bring up on this list.
Recent discussion of multiarch on the debian-devel list brought up an
interesting point: D-Bus' world-view is "there are only two native
encodings, LE and BE", but the truth is more complicated, particularly
for floating-point.

D-Bus currently makes the following implicit assumptions:

(0) char has 8 bits

    C standards explicitly do not require this, to allow for oddities like
    36-bit words with 9-bit bytes (PDP-10). I don't think we really care...

(1) Every useful ABI encodes negative integers in "two's complement",
    like x86 does

    C standards explicitly do not require this - they allow sign-and-magnitude
    too - but I'm not aware of any useful CPU where it isn't true.

(2) Every useful ABI has 8-, 16- and 32-bit integers, and optionally also
    64-bit integers.

    C standards do not require this; they only require that *if such types
    exist*, int32_t (etc.) are defined. Again, I'm not aware of any useful CPU
    where this isn't true. Currently, configure will fail unless 1-, 2- and
    4-byte types can be found.

    Recent versions of D-Bus will refuse to compile on platforms without a
    64-bit integer type unless explicitly told to do so (my attitude is that
    any modern libc/compiler ought to emulate 64-bit if not natively
    supported, as done on 32-bit GNU platforms).

(3) Native integers are always either little- or big-endian (more precisely:
    are in the same encoding as either x86 or SPARC)

    It's perfectly valid for a C implementation to store 0x12345678 in memory
    as { 0x56, 0x78, 0x12, 0x34 } ("mixed-endian"). The PDP-11 did something
    similar. However, I'm not aware of any modern platform that actually does
    this, and I'm inclined to say that D-Bus does not and will not support
    such platforms.

(4) Native integers are all in the same byte-order

    I believe it's also valid (although ill-advised) for a platform to have
    big-endian 16-bit integers but little-endian 32-bit. I don't propose to
    pretend we support such platforms.

(5) The C "double" type is an IEEE 754-2008 64-bit double

    I believe this is pretty much universal.

(6) The C "double" type is in the same byte-order as integers

    (In other words, the sign bit and the most significant 7 bits of the
    exponent are in the first byte on BE platforms or the last byte on LE,
    and so on)

    ARM FPA instructions (as used in the ABI of Debian's historical 'arm'
    architecture) are a rare example of an ABI where this isn't true - they
    expect mixed-endian doubles, on a CPU that's otherwise either LE or BE
    (Debian used LE). Debian's current ARM ports, 'armel' and
    'armhf', have moved to using IEEE doubles in the same order as x86.

(7) Every numeric bit-pattern means the same thing on every CPU, after
    endianness conversion

    I believe this is guaranteed as a consequence of (6). This covers
    normalized and denormalized numbers, +0 and -0.

(8) Every non-numeric bit-pattern means the same thing on every CPU,
    after endianness conversion

    This is not guaranteed by IEEE-754, as far as I can see - +infinity and
    -infinity have unique, defined encodings, but there are 2**52 bit-patterns
    for not-a-number, each of which could either be "quiet NaN" or
    "signalling NaN".

    One way to deal with this would be: during unmarshalling, replace
    every NaN with the current platform's usual "quiet NaN" bit-pattern.

    Another would be to declare transmission of NaNs across D-Bus 2.0 to be
    an error, like invalid UTF-8. I personally think that's not an option
    because it's excessively cruel to application developers.

Thoughts?
    S