Moving unzipping into a new thread

Michael Meeks michael.meeks at collabora.com
Fri May 12 10:09:04 UTC 2017


Hi Mohammed,

On 12/05/17 07:05, Mohammed Abdul Azeem wrote:
> I've implemented swing-buffer solution using two buffers, like Michael
> suggested. I'm attaching both patch and the profiles. Let me know what
> you think.

	Thanks - this is really interesting =) please do keep the dev. list
included. Loading the same large ODS spreadsheet - we we see a
super-linear speedup at least under cachegrind from threading.
Admittedly it is hard to get good %ages here - since we're parallelizing
the parse & tokenize of XML in the fast-parser; but this is fun.

	The overall pcycle count goes from 40.1bn to 38.4bn for the load - when
splitting out the unzip (and rtl_crc32) of the ZIP file - the CRC being
much of the cost.

* before
	+ inflate    93m
	+ rtl_crc32  101m
* after
	+ inflate    96m
	+ rtl_crc32  93m

	Curious that inflate gets slower; really not clear why that is - then
again the 5200 calls to 'inflate' seem quite inflated ;-) I guess with a
larger swing-buffer size we could reduce that, and (presumably) be more
efficient.

	Since this is a fairly bite-sized / separate task - it'd be good to
write-up some slideware around numbers to present on this and/or blog
about it - so we have the data around when it comes to conference time
etc. =)

> I still have to address a situation where the reader thread throws
> exception and stops unexpectedly, which would cause the producer to be
> stuck in an indefinite wait loop. Any suggestion as to how to
> communicate the error state back to producer thread?

	Would be great to have a broken zip file to test that against as a unit
test.

	I guess we just need to signal the producer in this case and (ideally)
wait for it to complete & join the thread to improve determinism. Will
read the patch =)

	Good work,

		Michael.

-- 
michael.meeks at collabora.com <><, Pseudo Engineer, itinerant idiot


More information about the LibreOffice mailing list