[PATCH 3-5] fdo#47644 performance regression on largish .doc #2 ...

Michael Meeks michael.meeks at suse.com
Sat May 12 16:32:07 PDT 2012


On Fri, 2012-05-11 at 22:22 +0100, Michael Meeks wrote:
> 	Argh - and it's entirely possible that this breaks the
> CVE-2010-3454-1.doc test on -3-5 - but it's rather too late to double
> check that now; seems to pass on master though; most odd. Will poke
> Monday.

	Wow - this was a -really- 'fun' problem to nail ;-) it turns out that
simply walking the fat chain pollutes the state of the streams in a way
that is extraordinarily hard to unwind; ie. even just calling the
original makePageChainCache method (or sim.) would seek beyond the end
of the stream, putting it in some un-recoverable state (or somesuch).

	Anyhow - after the big chunk of life working that out, it dawned on me
that there is no need for the pagechaincache building to be slow, and
that we should do it always, and incrementally as we read. Hopefully
that'll still allow us to recover parts of word documents that are not
seekable.

	So - I re-worked this to simplify, incrementally build the page chain
cache which might help performance in nasty corner cases, and also wrote
some regression tests [ which are hairy, the sot/ 'pass' documents have
some nice instances of broken FAT chains ;-].

	The 'slow.doc' parses in ~4 seconds for me now; though there is some
hyper-long and painfully incomprehensible 15 second thrash after that
(with no progress bar) still ;-)

	I'd like to get this reviewed, more widely tested and into -3-5 (as yet
it's not in master either, this is vs. -3-5 ;-) It'd be worth testing
any documents we know of, where previously we could recover some of the
document content from the beginning of the stream, where the end was
corrupt (I guess).

	Thanks !

		Michael.

-- 
michael.meeks at suse.com  <><, Pseudo Engineer, itinerant idiot


More information about the LibreOffice mailing list