[Libreoffice] minor idl fixes
Michael Meeks
michael.meeks at suse.com
Tue Dec 13 02:15:09 PST 2011
Hi Tomas,
On Tue, 2011-12-13 at 00:52 +0100, Tomas Hlavaty wrote:
> > that they are tiny,
>
> What does "tiny" mean?
Well - you're going to find it hard to make it bigger than the existing
rdb files ;-) but by tiny I really mean fast to read from disk and fast
to parse.
> Currently, rdb files are giant.
Sure; they are a disaster :-)
> I'm not sure why. If I simply concatenate all idl definitions for
> udkapi and offapi into one preprocessed file I get smaller file while
> still being a valid idl file containing all the information:
Yep; this is well known. It is all done re-using some code not intended
for thus purpose, which has been tweaked to the maximum to try to make
it suit it better, but it still doesn't ;-)
> Is 200kB considered tiny?
Sounds fine :-)
> And this is just original concatenated idl files.
Sure - sounds fine; if we can parse it fast.
> How long does reading the type information take at the moment?
That's quite hard to say; access to it is extremely scattered across
the code. callgrind gives 1.5% in libreg, 0.6% in libstore and some
lowish proportion of the 32% in libuno_sal; say perhaps 2.5%. That IMHO
hides it's true cost - we have to force pagein all that data before
start to avoid horrible I/O patterns mmap gives us as we seek about in
those big files.
> What do we get to do a lot at startup? I thought we simply load it an
> that's it.
Sure; we load it & that is it *but* we would really like to be
starting in total in under a second, at least making choices that hurt
that goal on a fast PC are almost certain to also hurt the goal of
working well on mobile devices etc. :-)
> If the new format is a text format (I would prefer text format over
> another binary one), there needs to be some parsing. unoidl2 can parse
> the allpp.idl file (containing all type information) and print the
> syntax tree in about 200ms:
>
> $ rm allpp.ast
> $ time make allpp.ast
> cat allpp.idl | ./unoidl2ast >allpp.ast
>
> real 0m0.247s
> user 0m0.170s
> sys 0m0.100s
250ms is a -really- long time IMHO; particularly since we have to parse
the entire file before startup. As Stephan says, perhaps we can overcome
this by inlining more in the generated C++ which may make that
acceptable later (after all bootstrapping python takes a good long time
itself anyway).
> If 200ms is slow, we could split the allpp,idl file into something
> smaller required at startup and the rest loaded lazily.
Possibly; or we could invent yet another format for this type
information. Personally, I'd like to keep the number of representations
of the same information as low as possible: we already have IDL, we have
the binaryurp format [ used for IPC on the wire ] (potentially we could
re-use that?), do we have an XML/text IPC protocol ? I suspect we will
want that for the remote Javascript/websockets magic - possibly we could
use a condensed XML format for this that'd be quicker to parse ?
unclear. Stephan - do you have some ideas ? as soon as I see a yacc
parser, I see "slow" and "busts the branch predictor" - but perhaps I'm
paranoid ;-)
> We could have a binary format, something like a mmap dump. That would
> be instant but rather ugly.
Sure - that'd be bad :-) I like the 'concatenate text files' approach
for building the the database (personally).
> Are there any other requirements? Like functionality related to
> rdbmerge and how extensibility works? Or is that not relevant anymore?
rdbmerge is/was IIRC just a compile-time tool. Clearly we need to
continue to be able to read old types.rdb files for some time to come,
but that can be de-coupled and removed later I think.
> I was under impression that these projects somehow depend on the rdb
> code, but if they depend on the typedescription api, then it is better
> then I hoped (if that typedescription api is somehow separate from the
> rdb file code).
Sure - there is only one place that we go grubbing with that nasty rdb
format - and it's at the bottom of the stack :-) if we can hot plug that
out with something else, life is good :-)
Thanks,
Michael.
--
michael.meeks at suse.com <><, Pseudo Engineer, itinerant idiot
More information about the LibreOffice
mailing list