[Libreoffice] minor idl fixes

Michael Meeks michael.meeks at suse.com
Tue Dec 13 02:15:09 PST 2011


Hi Tomas,

On Tue, 2011-12-13 at 00:52 +0100, Tomas Hlavaty wrote:
> > that they are tiny,
> 
> What does "tiny" mean?

	Well - you're going to find it hard to make it bigger than the existing
rdb files ;-) but by tiny I really mean fast to read from disk and fast
to parse.

> Currently, rdb files are giant.

	Sure; they are a disaster :-)

> I'm not sure why.  If I simply concatenate all idl definitions for
> udkapi and offapi into one preprocessed file I get smaller file while
> still being a valid idl file containing all the information:

	Yep; this is well known. It is all done re-using some code not intended
for thus purpose, which has been tweaked to the maximum to try to make
it suit it better, but it still doesn't ;-)

> Is 200kB considered tiny?

	Sounds fine :-)

> And this is just original concatenated idl files.

	Sure - sounds fine; if we can parse it fast. 

> How long does reading the type information take at the moment?

	That's quite hard to say; access to it is extremely scattered across
the code. callgrind gives 1.5% in libreg, 0.6% in libstore and some
lowish proportion of the 32% in libuno_sal; say perhaps 2.5%. That IMHO
hides it's true cost - we have to force pagein all that data before
start to avoid horrible I/O patterns mmap gives us as we seek about in
those big files.

> What do we get to do a lot at startup?  I thought we simply load it an
> that's it.

	Sure; we load it & that is it  *but* we would really like to be
starting in total in under a second, at least making choices that hurt
that goal on a fast PC are almost certain to also hurt the goal of
working well on mobile devices etc. :-)

> If the new format is a text format (I would prefer text format over
> another binary one), there needs to be some parsing.  unoidl2 can parse
> the allpp.idl file (containing all type information) and print the
> syntax tree in about 200ms:
> 
>    $ rm allpp.ast 
>    $ time make allpp.ast
>    cat allpp.idl | ./unoidl2ast >allpp.ast
> 
>    real  0m0.247s
>    user  0m0.170s
>    sys   0m0.100s

	250ms is a -really- long time IMHO; particularly since we have to parse
the entire file before startup. As Stephan says, perhaps we can overcome
this by inlining more in the generated C++ which may make that
acceptable later (after all bootstrapping python takes a good long time
itself anyway).

> If 200ms is slow, we could split the allpp,idl file into something
> smaller required at startup and the rest loaded lazily.

	Possibly; or we could invent yet another format for this type
information. Personally, I'd like to keep the number of representations
of the same information as low as possible: we already have IDL, we have
the binaryurp format [ used for IPC on the wire ] (potentially we could
re-use that?), do we have an XML/text IPC protocol ? I suspect we will
want that for the remote Javascript/websockets magic - possibly we could
use a condensed XML format for this that'd be quicker to parse ?
unclear. Stephan - do you have some ideas ? as soon as I see a yacc
parser, I see "slow" and "busts the branch predictor" - but perhaps I'm
paranoid ;-)

> We could have a binary format, something like a mmap dump.  That would
> be instant but rather ugly.

	Sure - that'd be bad :-) I like the 'concatenate text files' approach
for building the the database (personally). 

> Are there any other requirements?  Like functionality related to
> rdbmerge and how extensibility works?  Or is that not relevant anymore?

	rdbmerge is/was IIRC just a compile-time tool. Clearly we need to
continue to be able to read old types.rdb files for some time to come,
but that can be de-coupled and removed later I think.

> I was under impression that these projects somehow depend on the rdb
> code, but if they depend on the typedescription api, then it is better
> then I hoped (if that typedescription api is somehow separate from the
> rdb file code).

	Sure - there is only one place that we go grubbing with that nasty rdb
format - and it's at the bottom of the stack :-) if we can hot plug that
out with something else, life is good :-)

	Thanks,

		Michael.

-- 
michael.meeks at suse.com  <><, Pseudo Engineer, itinerant idiot



More information about the LibreOffice mailing list