Moving unzipping into a new thread
Michael Meeks
michael.meeks at collabora.com
Fri May 12 10:09:04 UTC 2017
Hi Mohammed,
On 12/05/17 07:05, Mohammed Abdul Azeem wrote:
> I've implemented swing-buffer solution using two buffers, like Michael
> suggested. I'm attaching both patch and the profiles. Let me know what
> you think.
Thanks - this is really interesting =) please do keep the dev. list
included. Loading the same large ODS spreadsheet - we we see a
super-linear speedup at least under cachegrind from threading.
Admittedly it is hard to get good %ages here - since we're parallelizing
the parse & tokenize of XML in the fast-parser; but this is fun.
The overall pcycle count goes from 40.1bn to 38.4bn for the load - when
splitting out the unzip (and rtl_crc32) of the ZIP file - the CRC being
much of the cost.
* before
+ inflate 93m
+ rtl_crc32 101m
* after
+ inflate 96m
+ rtl_crc32 93m
Curious that inflate gets slower; really not clear why that is - then
again the 5200 calls to 'inflate' seem quite inflated ;-) I guess with a
larger swing-buffer size we could reduce that, and (presumably) be more
efficient.
Since this is a fairly bite-sized / separate task - it'd be good to
write-up some slideware around numbers to present on this and/or blog
about it - so we have the data around when it comes to conference time
etc. =)
> I still have to address a situation where the reader thread throws
> exception and stops unexpectedly, which would cause the producer to be
> stuck in an indefinite wait loop. Any suggestion as to how to
> communicate the error state back to producer thread?
Would be great to have a broken zip file to test that against as a unit
test.
I guess we just need to signal the producer in this case and (ideally)
wait for it to complete & join the thread to improve determinism. Will
read the patch =)
Good work,
Michael.
--
michael.meeks at collabora.com <><, Pseudo Engineer, itinerant idiot
More information about the LibreOffice
mailing list