Object serialization

Simon McVittie simon.mcvittie at collabora.co.uk
Tue Feb 26 07:20:27 PST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 26 Feb 2008 at 15:32:03 +0100, Luigi Paioro wrote:
> So, for instance, this could be an example of mapping (pseudo-python 
> example):
> 
> 
> {"class_name": "Foo",
>   "class_namespace": "org.mapping.example",
>   "name": "My foolish Foo name",
>   "width": 50,
>   "height": 250,
>   "colours": (255, 150, 25),
>   "bar": {"class_name": "Bar",
>           "class_namespace": "org.mapping.example",
>           "name": "My foolish Bar name",
>           ...
>          },
>    "foobar": {"class_name": "MyFooBar",
>               "class_namespace": "org.mapping.example.myimpl",
>               "id": 125300,
>               "flag": true,
>               ...
>              }
> }

Whoa! No, that isn't what I meant at all. For what use-case is it useful
to pass arbitrary "objects" over D-Bus? What do these objects represent?
How will you avoid arbitrary code execution caused by naive clients
receiving such a "serialized object"? Or is arbitrary code execution
indeed exactly what you're after?

The object model usually used on D-Bus is that objects exist inside a
particular service's process space. Clients interact with them by calling their
methods and listening for their signals over D-Bus; in most bindings, the
client will have a proxy object in *its* process space, which represents the
state of the remote object.

When passing data between clients and services, it's not meaningful to
talk about "objects" in an object-oriented sense, because the client and
the service should never be required to share any particular behaviour
(code) - remember that the client and service may be written in
different languages. The closest thing to an "object" that can meaningfully
pass between client and service is a bundle of related data (an
"object", but only in the sense that JSON uses the word).

There are two basic ways you can represent such a bundle of related
data. If you want future extensibility or optional fields, you can use a
dictionary, interpreted as a mapping from field names to values. If you want
fixed fields, you can use a struct, interpreted as a tuple of fields.

For instance, in Telepathy we represent an IPv4 address:port as a struct
containing a string and an unsigned 16-bit integer (the fields are
obvious), but we plan to represent geolocation information as a
dictionary (so fields can be made optional and new fields can be added).

Encoding things for D-Bus in this way involves the same sort of design
decisions you'd make when choosing how to encode something in JSON or XML.

Now, if what you want is some sort of generic object storage/retrieval system,
you have exactly the same issues as if you were doing generic object
storage and retrieval using files on disk. You'll have to define a way to
construct an object instance from a dictionary, struct or byte-array,
enforcing use of a "safe" constructor that cannot crash or
cause unwanted code to execute when it is fed invalid data, then have
objects provide a method that dumps their state into such a format.

There are two possible approaches to this. One is to be
language- and runtime-specific, and live with the fact that you won't be
able to deserialize an object unless you've implemented a very similar
API using the same language and libraries. The other is to define your
object serialization in a domain-specific way, write a
language-independent specification for how to serialize and deserialize
particular types of object, and then implement that specification in one
or more languages.

Python's 'pickle' module is an example of a language-specific solution
(the first approach I described above). Note that unpickling can cause
arbitrary code execution, so pickling is not suitable in many environments, and
in particular should never be used over networks, unless the network itself
*and* all participants in the networked protocol are absolutely trusted.
(It should be possible to make a pickle-like protocol which does not have
arbitrary code execution, though.) Doing a straight memory-dump to disk,
like early MS Word versions did, is also an example of this approach,
albeit not a very good one :-)

Many data structures in Telepathy are examples of a domain-specific
solution (the second approach I described). All interoperable file
formats (e.g. PNG, HTML, OpenDocument) are also examples of this approach.

I believe that attempting to "solve" this problem in a generic way, without
having any particular use-case in mind, is likely to lead to a system that's
not particularly useful for *any* use case.

    Simon
-----BEGIN PGP SIGNATURE-----

iD8DBQFHxC47WSc8zVUw7HYRAjx5AJ9r+TNwSNYTBgNxPDuhr1akpnbtpACffRL9
nYQ3QfmLRMRD1A7BdxcpuWI=
=T4GI
-----END PGP SIGNATURE-----


More information about the dbus mailing list