[Libreoffice] minor idl fixes

Mon Dec 12 12:30:24 PST 2011

On 12/10/2011 02:57 PM, Tomas Hlavaty wrote:
> There seems to be agreement that the RDB type database should go away.
> There are several LO projects that would be affected by this and they
> seem rather complex with dependencies.  Also, for example the conversion
> from uno idl to java and cli goes directly to the binary format (class
> file, assembly) and it's hard to see what they actually generate.
>
> As a proof of concept, I have created a unoidl2 project:
>
>     git clone http://logand.com/git/unoidl2.git
>
> Note that it doesn't work yet but already has the parsing part in place,
> i.e. can generate ast and has an incomplete unoidl2java generation.
>
> The idea is that there would be a collection of small programs that
> would convert uno idl files to different languages, e.g. unoidl2java,
> unoidl2cs, unoidl2vala, unoidl2xml, unoidl2py etc.  These programs
> should be as simple as possible with no dependencies so that anybody can
> write a new uno idl converter for his programming language easily.

Historically, the situation is as follows:

On the one hand, there is a types.rdb (or split across a handful of 
such) that stores all the UNOIDL type information in a binary format. 
Its information content is thus, in a sense, isomorphic to the 
collection of the .idl files.  In the following, I'll call this "the 
complete type information."

On the other hand, the UNO binding for a given language in some cases 
needs part of that UNOIDL type information mapped to constructs of the 
given language:

The C++ UNO binding requires C++ class definitions for the UNOIDL 
interface, struct, exception, etc. types.  For this, cppumaker extracts 
some of the data from the complete type information and generates the 
relevant C++ class definitions from it, in the form of C++ header files. 
  Note that typically not the complete type information is encoded in 
those header files (though there are switches to cppumaker to control 
the degree of included information, mainly for purposes of bootstrapping 
a UNO environment at runtime).  At runtime, the C++ UNO binding queries 
the types.rdb for certain data (e.g., when (un-)packing data as ANY).

The Java UNO binding requires Java class files for the UNOIDL interface, 
struct, exception, etc. types.  For this, javamaker extracts (almost) 
the complete type information and generates Java class files from it. 
(It does not go via .java source files so it could simultaneously 
support Java 4 and new Java 5 features at a time when that was still 
relevant.)  At runtime, the Java UNO binding never needs to query the 
types.rdb (and there's no Java code that can read that format) -- the 
information encoded in the .class files is rich enough for all the 
binding's demands.

Binding UNO to dynamic languages like Python takes another approach.  It 
obtains any necessary information purely at runtime, from the types.rdb, 
via UNO services that make available that information.  It does not 
require any *maker tool to generate language-specific artefacts from the 
complete type information upfront.

So, I think there is still demand to make available the complete type 
information at runtime.  Just the format of binary .rdb files is rather 
inconvenient.

My point of view is to replace the binary .rdb format with a simpler, 
most likely textual format (for which easy reading can be implemented in 
any language, if need be).  One viable approach seems to be to 
streamline the current .idl file format and directly use that syntax for 
any new-style ".rdb" files (catenating together the individual .idl 
files into a large .rdb file, say).

Another point to re-evaluate is how much of that complete type 
information to duplicate in the artefacts generated for the various 
language bindings.

On the one hand, it might be advantageous to encode more information in 
the generated C++ artefacts (potentially even generating a dedicated 
dynamic library containing the relevant, instead of spreading it inlined 
across header files, where the linker can recombine part of it again). 
That could help ensure that the (new-style) types.rdb need not be read 
early on during startup (so if loading it where somewhat costly that 
would not be that much of a problem as it is today).

On the other hand, an easily parsable format would allow Java to reduce 
the amount of information currently stored in generated .class files, 
and instead rely on the complete type information available as a 
(new-style) types.rdb.  Java .class files could even be generated on the 
fly from the types.rdb information using a dedicated Java class loader. 
  (That would remove the ugly requirement that .oxt extensions need to 
bring their additional UNO types as both .rdb and .class files.)

> LO projects like registry, rdbmaker, regview, regmerge, idl, idlc,
> climaker, javamaker, codemaker would be deprecated.

Though idlc, climaker, javamaker, codemaker would "just" be replaced 
with your new code (that conceptually does the same thing), if I get you 
right?

(Note that module idl is completely unrelated to UNO.)

> The other affected LO projects would likely be:
>
> - binaryurp
> - bridges
> - cli_ure
> - cppu
> - cppuhelper
> - javaunohelper
> - pyuno (native?)
> - ridljar
> - scripting
> - unodevtools
> - unoil
>
> although there is little information on what some projects are supposed
> to do and how people use them.

As Micheal already pointed out, changes to the .rdb format would best be 
kept behind the existing interfaces abstracting from it, so that those 
"client" modules would hardly notice.

> Do we have a better documentation on type mappings then for example
> <http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/ProUNO/Java/Type_Mappings>?

For Java I once detailed that at 
<http://wiki.services.openoffice.org/wiki/Uno/Java/Specifications/Type_Mapping> 
(and the UNO type system itself at 
<http://udk.openoffice.org/common/man/typesystem.html>).

Stephan