Dict parsing question

Simon McVittie smcv at collabora.com
Wed Jan 3 14:11:25 UTC 2024


On Wed, 03 Jan 2024 at 10:50:30 +1300, Lawrence D'Oliveiro wrote:
> I wrote:
> > But you don’t know on what boundary the array length is positioned: if
> > it’s already at an address that is a multiple of 8 bytes (which could
> > happen, say, 50% of the time), then the addition of 4 bytes of padding
> > means the element field is no longer 8-byte-aligned.
> 
> Sorry, wrong way round: if the initial length field begins on an
> address of the form 4n + 4 (where n is an integer), then if you add 4
> bytes of padding after that, the first element will also be on an
> address of the form 4n + 4, which means it will not be 8-byte-aligned.

Addresses of the form 4n + 4 are 8-byte aligned half the time, but I know
what you mean. (I think you meant to write 8n + 4).

The rule is that you must add exactly the amount of padding that is
required to achieve the desired alignment, and no more. Adding padding
where it is not needed is just as serious a bug as not adding padding
where it is needed, and will cause the message to be misinterpreted by
spec-compliant clients; so if you are writing out an array at offset 8n,
you will have to add 4 bytes of padding after the length, but if you
are writing out the same array at offset 8n+4 then you must not.

D-Bus generally uses "natural alignment" (4-byte quantities are 4-byte
aligned, etc.), which is reasonably space-efficient, and allows efficient
direct access by every CPU architecture I'm aware of. Some CPUs have
*less* strict alignment requirements, like m68k only needing 4-byte
quantities to have 2-byte alignment, and x86 allowing any unaligned
pointer to be dereferenced with only a performance penalty. The design
of D-Bus does not optimize for specific CPUs like x86 and m68k, but
instead makes the pessimistic assumption that it is wrong to dereference
an unaligned pointer (which is correct for e.g. ARM or ia64, and
unnecessarily strict for x86 or m68k).

In principle there might be CPUs with *more* strict alignment requirements
than "natural" alignment, but it would be difficult to implement Standard
C on such a thing, because Standard C requires arrays to be packed (no
padding between items); so there is a strong disincentive for hardware
manufacturers to require larger-than-natural alignment, and therefore a
low cost to assuming that natural alignment is enough.

The exception to the general rule of natural alignment is that D-Bus dict
entries and structs are always 8-byte aligned, sacrificing some
space-efficiency for a somewhat simpler implementation (no need to look
ahead at the first type in the struct to find out what its alignment will
need to be).

> > This means that the array will sometimes pack more efficiently than
> > how you might have imagined that it worked.
> 
> That also means that the array can no longer be laid out as a
> self-contained structure, you have to look at how it is located within
> the entire message structure.

Yes. You could store it in RAM as

    struct {
        uint32_t len_bytes;
        int element_alignment;    /* 8 in the case of dict entries */
        void *elements;
    }

and write it into a message like this pseudocode:

    pad_to_alignment(4);      /* i.e. append 0 <= n < 4 bytes */
    write_uint32(arr.length);
    pad_to_alignment(arr.element_alignment);
    write_bytes(arr.elements, arr.len_bytes);

but there is no way to store a pre-serialized D-Bus array as a single block
of bytes that can be memcpy'd into a message in a single operation without
already knowing the alignment of the "cursor".

If you were designing a new message encoding, depending on your priorities
and design choices, perhaps you would sacrifice some other property to
avoid this (like perhaps making small top-level elements require more
padding than "natural" alignment; or conversely, making message items
be unaligned, so that implementations have to memcpy() them out of the
message and cannot just dereference a pointer). But, as I said, D-Bus
is not a new message encoding, so at this point it doesn't actually
matter which way is better or what the priorities are/were, because
interoperability with the last 20 years of D-Bus is the most important
property now.

    smcv


More information about the dbus mailing list