XFastParser - next steps ...

Mon Aug 1 14:51:15 UTC 2016

Hi Mohammed,

On Mon, 2016-08-01 at 12:09 +0530, Mohammed Abdul Azeem wrote:

>                 A. optimize clearing the pending events - unlikely to
>         give
>                    a big win, but nice.
>
> This is done.

	Great.

>                 B. merge the legacyfastparser pieces into SvXMLImport
>
> If we do this, it will be
> XParser -> XFastParser -> unknown elements -> callbackDocumentHandler
> -> SvXMLImport -> tokenize (SvXMLNamespaceMap) -> FastContexts

	As a first cut, then yes we will tokenize and de-tokenize and
re-tokenize ;-) but the de-tokenize is just looking up in array:

	OUString aTokens[128];

	aTokens[nTokenIndex]

	which is quick - the rest is done in another thread. Also (obviously)
we want to only tokenize once and in the thread and share that moving
ahead.

>                 C. consider how to allow XFastParser tokenization
>         selectively
>                    just for the elements eg. ScXMLTableRowCellContext
>         that
>                    can get the maximum benefit in the short-run.
...
> So, then we will somehow selectively tokenize elements and attributes
> belonging to  ScXMLTableRowCellContext, so as to avoid
> SvXMLNamespaceMap pieces.

	Hmm; - I think we need to tokenize them all - but lets get there first.
Lets get all of the tokens mapped to and fro; and then lets see if we
can't look at the profile, and work out how to tunnel through a few of
these contexts to use the fast-parser directly =)

	So - eg. currently we have:

SvXMLImport::startElement

	which calls CreateContext(...) - to create the handler for the next
element down the tree.

	We could have a virtual FastContext *CreateFastContext(...) method that
returned a distinct FastParser context and if it is not there we fall
back to the old / dummy methods there =) Using that we could convert the
XML tree context handlers from the leaves upwards. Which would save a
lot of bother. There is already some partial attempt to integrate the
FastParser into xmloff/ that looks unlikely to do anything at all (to
me) =) worth not getting confused / tangled up in that though possibly
worth reading that through 'git grep -5 startFastElement' inside xmloff.

> For this I still have some questions in mind. Are we going to tokenize
> everything from FastParser and then de-tokenize in the callback
> handler (token based startElement)? That's the idea here?

	Initially - yes; it sounds mad, but then allocation is also expensive -
and there is no 'free' for integer tokens ;-)

> If I've got anything wrong or you have some insights, please share it
> here. :)

	HTH,

		Michael.

-- 
michael.meeks at collabora.com <><, GM Collabora Productivity
 Skype: mmeeks, Google Hangout: mejmeeks at gmail.com
 (M) +44 7795 666 147 - timezone usually UK / Europe