XFastParser - next steps ...
michael.meeks at collabora.com
Mon Aug 1 14:51:15 UTC 2016
On Mon, 2016-08-01 at 12:09 +0530, Mohammed Abdul Azeem wrote:
> A. optimize clearing the pending events - unlikely to
> a big win, but nice.
> This is done.
> B. merge the legacyfastparser pieces into SvXMLImport
> If we do this, it will be
> XParser -> XFastParser -> unknown elements -> callbackDocumentHandler
> -> SvXMLImport -> tokenize (SvXMLNamespaceMap) -> FastContexts
As a first cut, then yes we will tokenize and de-tokenize and
re-tokenize ;-) but the de-tokenize is just looking up in array:
which is quick - the rest is done in another thread. Also (obviously)
we want to only tokenize once and in the thread and share that moving
> C. consider how to allow XFastParser tokenization
> just for the elements eg. ScXMLTableRowCellContext
> can get the maximum benefit in the short-run.
> So, then we will somehow selectively tokenize elements and attributes
> belonging to ScXMLTableRowCellContext, so as to avoid
> SvXMLNamespaceMap pieces.
Hmm; - I think we need to tokenize them all - but lets get there first.
Lets get all of the tokens mapped to and fro; and then lets see if we
can't look at the profile, and work out how to tunnel through a few of
these contexts to use the fast-parser directly =)
So - eg. currently we have:
which calls CreateContext(...) - to create the handler for the next
element down the tree.
We could have a virtual FastContext *CreateFastContext(...) method that
returned a distinct FastParser context and if it is not there we fall
back to the old / dummy methods there =) Using that we could convert the
XML tree context handlers from the leaves upwards. Which would save a
lot of bother. There is already some partial attempt to integrate the
FastParser into xmloff/ that looks unlikely to do anything at all (to
me) =) worth not getting confused / tangled up in that though possibly
worth reading that through 'git grep -5 startFastElement' inside xmloff.
> For this I still have some questions in mind. Are we going to tokenize
> everything from FastParser and then de-tokenize in the callback
> handler (token based startElement)? That's the idea here?
Initially - yes; it sounds mad, but then allocation is also expensive -
and there is no 'free' for integer tokens ;-)
> If I've got anything wrong or you have some insights, please share it
> here. :)
michael.meeks at collabora.com <><, GM Collabora Productivity
Skype: mmeeks, Google Hangout: mejmeeks at gmail.com
(M) +44 7795 666 147 - timezone usually UK / Europe
More information about the LibreOffice