Floating-point and mixed-endianness in D-Bus (was: dbus mini-summit)
Simon McVittie
simon.mcvittie at collabora.co.uk
Mon Aug 8 08:59:19 PDT 2011
(Deliberately breaking the thread, this is only tangentially related)
On Mon, 08 Aug 2011 at 10:50:29 -0400, Colin Walters wrote:
> * Talks of DBUS 2.0
> - What do we need to break
> - floats
This reminds me of something I'd been intending to bring up on this list.
Recent discussion of multiarch on the debian-devel list brought up an
interesting point: D-Bus' world-view is "there are only two native
encodings, LE and BE", but the truth is more complicated, particularly
for floating-point.
D-Bus currently makes the following implicit assumptions:
(0) char has 8 bits
C standards explicitly do not require this, to allow for oddities like
36-bit words with 9-bit bytes (PDP-10). I don't think we really care...
(1) Every useful ABI encodes negative integers in "two's complement",
like x86 does
C standards explicitly do not require this - they allow sign-and-magnitude
too - but I'm not aware of any useful CPU where it isn't true.
(2) Every useful ABI has 8-, 16- and 32-bit integers, and optionally also
64-bit integers.
C standards do not require this; they only require that *if such types
exist*, int32_t (etc.) are defined. Again, I'm not aware of any useful CPU
where this isn't true. Currently, configure will fail unless 1-, 2- and
4-byte types can be found.
Recent versions of D-Bus will refuse to compile on platforms without a
64-bit integer type unless explicitly told to do so (my attitude is that
any modern libc/compiler ought to emulate 64-bit if not natively
supported, as done on 32-bit GNU platforms).
(3) Native integers are always either little- or big-endian (more precisely:
are in the same encoding as either x86 or SPARC)
It's perfectly valid for a C implementation to store 0x12345678 in memory
as { 0x56, 0x78, 0x12, 0x34 } ("mixed-endian"). The PDP-11 did something
similar. However, I'm not aware of any modern platform that actually does
this, and I'm inclined to say that D-Bus does not and will not support
such platforms.
(4) Native integers are all in the same byte-order
I believe it's also valid (although ill-advised) for a platform to have
big-endian 16-bit integers but little-endian 32-bit. I don't propose to
pretend we support such platforms.
(5) The C "double" type is an IEEE 754-2008 64-bit double
I believe this is pretty much universal.
(6) The C "double" type is in the same byte-order as integers
(In other words, the sign bit and the most significant 7 bits of the
exponent are in the first byte on BE platforms or the last byte on LE,
and so on)
ARM FPA instructions (as used in the ABI of Debian's historical 'arm'
architecture) are a rare example of an ABI where this isn't true - they
expect mixed-endian doubles, on a CPU that's otherwise either LE or BE
(Debian used LE). Debian's current ARM ports, 'armel' and
'armhf', have moved to using IEEE doubles in the same order as x86.
(7) Every numeric bit-pattern means the same thing on every CPU, after
endianness conversion
I believe this is guaranteed as a consequence of (6). This covers
normalized and denormalized numbers, +0 and -0.
(8) Every non-numeric bit-pattern means the same thing on every CPU,
after endianness conversion
This is not guaranteed by IEEE-754, as far as I can see - +infinity and
-infinity have unique, defined encodings, but there are 2**52 bit-patterns
for not-a-number, each of which could either be "quiet NaN" or
"signalling NaN".
One way to deal with this would be: during unmarshalling, replace
every NaN with the current platform's usual "quiet NaN" bit-pattern.
Another would be to declare transmission of NaNs across D-Bus 2.0 to be
an error, like invalid UTF-8. I personally think that's not an option
because it's excessively cruel to application developers.
Thoughts?
S
More information about the dbus
mailing list