recursive types, struct, custom, dict, etc.

Olivier Andrieu oliv__a at users.sourceforge.net
Tue Jun 1 21:04:38 PDT 2004


Hi,

I was also writing a mail summarizing some of these issues, so I'll
just put my draft here and then comment your mail.




Right now, the type representation in dbus knows about a bunch of
primitive types (various flavours of integers, floats, boolean,
string, ...) and two aggregate types (ARRAY and DICT). ARRAY is a
parametrized type (we have to specify the type of the values in the
array) whereas DICT is not (it contains value from any type).

Compared to type systems found in high-level language, it lacks two
other kinds of complex types:

 - product types (aka tuples, or structs, or records): a collection of
a fixed number of values from various types (possibly with field
names).

 - sum types (aka variants, discriminated unions): one value from a
set of possible types.

Sum types are needed when used as parameters to other types.  Right
now, if you want to transmit for instance the sum type INT32_OR_FLOAT,
you either do dbus_message_iter_append_int32 or a
dbus_message_iter_append_float.  But you cannot encode an ARRAY of
INT32_OR_FLOAT. I guess one can work around the lack of such a complex
type, but I think union types can be really worthwhile.  For instance,
the handling of optional values (think optional arguments in a method
call) with a type like STRING_OR_NIL. 

One solution is to introduce a non-parametrized ANY type which is the
sum of all DBus types. This ANY type is more or less already implicit,
since DICT contains ANY values and a DBusMessage itself is basically
an ARRAY of ANY.  Another solution is to introduce proper parametrized
union types where one could specify ARRAY of (INT32 or FLOAT).


Concerning product types, again two options. The first is to introduce
a non-parametrized type, like what this patch implemented:
http://freedesktop.org/pipermail/dbus/2004-March/000856.html . That's
a simple typecode TUPLE which starts a sub-DBusMessage, using the same
layout and encoding than the global message. The second option is to
use a parametrized type, say STRUCT of (INT32 and BOOLEAN and
STRING), this is what Havoc described in his mail.





 Havoc Pennington [Tue, 01 Jun 2004]:
 > Hi,
 > 
 > This is I think the most difficult remaining implementation task to be
 > ready for 1.0, and the remaining protocol change. I could be wrong. See
 > previous threads e.g. 
 > http://freedesktop.org/pipermail/dbus/2004-March/000840.html
 > http://freedesktop.org/pipermail/dbus/2004-March/000919.html
 > and I'm pretty sure it's come up a few other times.
 > 
 > I'll append some preliminary notes on the subject proposing how we
 > address this stuff. In essence add STRUCT and fully recursify the type
 > system, or back down again to a limited set of primitive types.
 > 
 > Havoc
 > 
 > Current Situation
 > ===

<snip>

 > Proposed Situation
 > ===

 > For structs we basically introduce grouping, so we could represent by
 > parens. Say we have foo (int, struct { double, double }) that could
 > have type signature "i(dd)"
 > If we have foo (int, array of struct { double, double}) that is 
 > "ia(dd)" and so forth.
 > 
 > In this case structs are almost the same as CUSTOM but there's no name
 > for the struct. If we wanted we could name structs, maybe just insert
 > that into the type signature in some conventional way:
 >  "ia('MyStruct'dd)"
 > 
 > The problem with this is that it puts one bit of instrospection
 > annotation in the protocol, while most introspection annotation is in
 > the Introspect() return value. More discussion later in these notes,
 > see below.
 > 
 > If we introduced a variant type (pretend its code is "v") we could
 > replace DICT with something like:
 >  "('StringVariantMap'asav)"

rather something like "a('NamedPair'sv)", no ? (and 'v' is already
taken by NIL, stands for void I guess).

 > i.e. struct StringVariantMap { array<string>; array<variant>; }

array<struct {string, variant}> ?

 > API implications
 > ===

<snip>

 > I would propose that whenever D-BUS implements a get_int() (or
 > equivalent via get_args()) that the wire protocol may contain a
 > VARIANT, which would be automatically converted to int if the
 > variant indeed contains an int. This would allow language bindings
 > such as python to always return variant types over the wire (often
 > the real type is unknown), and still interoperate with other
 > bindings. e.g.  if in python I return an empty list, that would go
 > back as a method reply with argument ARRAY of VARIANT, and then if
 > some C code asks for an ARRAY of INT it would successfully get an
 > empty ARRAY of INT.

The problem with this approach is that it goes against the idea of
having a separate type signature. If you send a message of type "isi"
for instance, the receiver should be able to understand messages of
signature "vvv", "ivv", "vsv", "vvi", etc ... so much for the quick
dispatching using the signature :) By the way, the API
dbus_message_has_signature(message, signature) would not be very
useful in this case, something like
dbus_message_signature_compatible() with a more lenient check than
strcmp would probably be necessary.

 > Discussion
 > ===
 > 
 > Reasons to make this change:
 > 
 >  - it's all elegant and stuff
 > 
 >  - it should clean up the code a bit, the code is currently doing
 >    things both ways (for arrays and for everything else), though
 >    keeping a variant type preserves the two ways to some extent
 
That's the crux of the problem I think : having both non-parametrized
union type (ANY) and a parametrized product type (STRUCT). ANY is a
bit of pain to handle in a statically typed language (you have to
dispatch on it in unmarshaling functions) and having it pratically
eliminates the benefits of knowing the types appearing in the struct
since any of them could be a variant. On the other hand, not having
ANY may be a pain for dynamically typed language (in marshaling
functions). 

 >  - we can typecheck incoming messages with a single strcmp();
 >    also overloaded methods could be more quickly routed

except when there are variants in the signature

 >  - maps more naturally to statically typed languages

not really (because of the variants)

 > Reasons not to make this change:
 > 
 >  - structs are probably sort of annoying to deal with in language
 >    bindings

why ? I rather had the impression they were not a problem. Either
handle them generically using introspection data and a big unmarshling
function, or handle them statically using some sort of code generation
(preprocessor, IDL compiler, etc.).

 > So the summary I would say is that we should either drop array of
 > array and go back to a straightforward hardcoded type list, plus an
 > escape hatch of CUSTOM. Or we should go all the way and get the
 > benefits of adding STRUCT and breaking type signatures apart from type
 > codes.

I'd say: add a non-parametrized STRUCT type, a non-parametrized ANY
type, and forget about separating type signatures. I think we need ANY
because of DICTs and since having an ANY type constitutes an escape
hatch that can make type signatures almost useless in some cases, I'm
not sure separating type signatures is worth the effort.

 > Odds and Ends
 > ===
 > 
 > The NIL type:
 > 
 >   NIL doesn't make a hell of a lot of sense as a *type*, really it's a 
 >   value that's allowed in *some* languages to replace a value of any
 >   type. I think we need to get rid of DBUS_TYPE_NIL since I can't make 
 >   any sense out of it.

Well, I think it can be useful (see above, about optional arguments).

 > Struct names:
 > 
 >   I think there's a good argument to be made that struct names
 >   should not be in the type signature or protocol, but instead be
 >   in the introspection data (where we also have arg names already,
 >   and could add struct field names in addition to the name of the
 >   struct itself).

I agree with this.

-- 
   Olivier



More information about the dbus mailing list