recursive types work

Havoc Pennington hp at redhat.com
Tue Dec 28 23:47:54 PST 2004


Hi,

I've done most of the core marshaling code for roughly this proposal:
http://lists.freedesktop.org/pipermail/dbus/2004-June/001169.html

It's in cvs as dbus-marshal-recursive.[hc] though not included in the
build yet. (Because the hard part of modifying the rest of dbus to use 
the new types/marshaling hasn't been done.)

Some of the major decision points:

 - CUSTOM goes away
 - NIL goes away
 - VARIANT and STRUCT are added
 - struct names are only in the introspection data, not the type 
   signature
 - no implicit type conversions, as just discussed
 - DICT remains for now

DICT as it is in today's D-BUS is just an
ARRAY of STRUCT { string, variant } with a special-case typecode.
i.e. :
 - it is always string,variant not parameterized
 - the distinction between data structures like 
   array of pairs vs. binary tree vs. hash table 
   is irrelevant to D-BUS and only applies to bindings

I suppose DICT might as well be parameterized, it wouldn't really be
harder at all for dbus and most languages (perl, python, C++) should be
OK with it. In fact we could implement DICT exactly like ARRAY, except
that the type parameter is validated to be a STRUCT with two fields.

To create a message the function calls are like this:

   write (INT32)
   recurse (STRUCT)
     write (STRING)
     write (DOUBLE)
     recurse (ARRAY of ARRAY of BOOL)
       recurse (ARRAY of BOOL)
         write (BOOL)
         write (BOOL)
       unrecurse ()
       recurse (ARRAY of BOOL)
         write (BOOL)
       unrecurse ()
     unrecurse ()
     write (UINT32)
   unrecurse ()
   write (OBJECT_PATH)
   recurse (VARIANT)
     write (STRING)
   unrecurse ()
   recurse (DICT)
     recurse (STRUCT)  // we could make this automatic for DICT perhaps
       write (STRING)
       recurse (VARIANT)
         write (DOUBLE)
       unrecurse ()
     unrecurse ()
     recurse (STRUCT)
       write (STRING)
       recurse (VARIANT)
         write (INT32)
       unrecurse ()
     unrecurse ()
   unrecurse ()

To read this message (I won't show all of it again):

   read (INT32)
   next ()
   recurse (STRUCT)
     read (STRING)
     next ()
     read (DOUBLE)
     next ()
     if (array is not empty)
       recurse (ARRAY of ARRAY of BOOL)
         if (array is not empty)
           recurse (ARRAY of BOOL)
           read (BOOL)
           next ()
           read (BOOL)
         if (array is not empty)
           recurse (ARRAY of BOOL)
           read (BOOL)
     next ()
     read (UINT32)

Note the "if (array is not empty)" - that's because the iterator is over
values and not types, so recursing into an empty array isn't possible
(there's nowhere for the iterator to "point"). At least, I got confused
trying to figure out how to allow recursing into empty arrays.

This implies that some sort of separate type signature iterator may be
needed in the API contrasting with the value iterator. To avoid this, a
possible interpretation of recursing into an empty array is that the
value iterator becomes a type iterator. If we do it that way then you
can recurse into an empty array. However the iterator code will have a
kind of funky special case throughout, 
"if (iter->types_only)"

The advantage of doing this though is that the app or bindings
programmer will get what they expect.

For both reading and writing recursion is implemented like this:
 recurse (&iter, &subiter)
   read (&subiter)

Which means that recursion chews up stack instead of heap with the naive
application usage. The alternative would be to keep 
a dynamically-allocated stack inside a single global iter:
  recurse (&iter)
    read (&iter)

You can still get that effect with the iter/subiter API design if you
want to (just alloc iters on the heap and keep them in a stack).

There will be some sort of recursion depth limit built in to dbus of
course.

It turns out that the primary cause of complexity is not structs but
recursive arrays, because arrays give the type signature only once for
many (possibly recursive) elements.

Along with the marshaling changes I'm planning to change the accessors
for values in the following ways:

 - they will always return const data, byteswapping on the fly if 
   required.

 - the typesafe accessors will be dropped at least everywhere internal
   in favor of:
       uint32 value;
       read (&iter, DBUS_TYPE_UINT32, &value)
   the type DBUS_TYPE_UINT32 is specified only for error checking, 
   since &iter knows what type it's looking at.

   Open question is whether to keep get_uint32() type of functions
   for dbus-message.h, even while dropping them all internally.
   They are strictly speaking convenience functions and thus should 
   live in the bindings.

 - the "array of primitive type" accessors such as get_array_of_uint32()
   become strictly optimizations; you could also recurse into the array
   and get each uint32 individually.

One bug I discovered in current dbus is that arrays of 64-bit integers
are not guaranteed to be properly aligned. So that will be fixed by
inserting padding between the array length and the array values.

Structs will always be aligned on 8-byte boundaries regardless of the
struct fields; this will cause struct { int32; } to use double the space
of simply int32, but avoiding that seems more complicated than it's
worth.

That's everything I can think of re: design decisions so far.

Havoc




More information about the dbus mailing list