a few more comments ...

Michael Meeks michael.meeks at collabora.com
Mon Jun 5 18:40:29 UTC 2017


Hi Mohammed,

	Just a few more esthetic comments =) on a nice patch.

+    size_t mnPos;
+    size_t mnStreamSize;
+
+    Buffer maInUseBuffer;
+    int mnOffset;

	I'd love to see some doxygen comments:

	size_t mnPos; /// position in stream
	int mnOffset; /// position in maInUseBuffer

	etc. added to this header =) makes it easier to read.

	Also - since we wrote all this code ourselves - we don't need to use
the ALv2 variant header - lets just use the MPLv2 header from
TEMPLATE.SOURCECODE.HEADER in the top-level for the two new files.

	Now I read it again, I'm rather skeptical that:

+        if( !maUsedBuffers.empty() )
+        {
+            pProducedBuffer = maUsedBuffers.front();
+            maUsedBuffers.pop();
+        }

	is helpful. I'm not sure that we can re-use these buffers efficiently
here - perhaps its better just to allocate new ones in the stream read
method; can you profile with and without that (would love to get rid of
the extra complexity if possible).

+class UnzippingThread: public salhelper::Thread
+{
+    XBufferedThreadedStream *mxStream;

	I'd use a reference - to clarify lifecycle: so "&mrStream" - but of
course, with std::thread we could use a lambda for this thing I guess
and capture it ...

	Gotcha - I think I found the bad bit which is this:

commit 4ae705d02df0ddf75b97d0e94add6994626f487e
Author: Kohei Yoshida <kohei.yoshida at collabora.com>
Date:   Fri Jan 13 20:47:46 2017 -0500

    tdf#97597: Ensure that each parsing thread has its own buffer.

	The package/ code is (I think) thread-safe, but is not safe wrt. the
file-pointer underlying itself. I -think- it believes that only one
thread is read at a time.

	Kohei's change here - I think is to make each stream fully de-compress
itself into memory before parsing it - which fixed some horrible calc
problem.

	Hmm - so we're going to need to fix the root cause here I think, not
least because de-compressing the whole stream can be really very
expensive: ODF eg. is -incredibly- verbose with the XML taking up far
more memory than the equivalent calc data structures.

	Sooo ... I think your code is golden; but we're going to need to fix
the package/ code to avoid the bug.

https://bugs.documentfoundation.org/show_bug.cgi?id=97597#c28

	Suggests (input from Kohei appreciated) - that we may need to do a new
'seek' each time we come to reading some bytes from the underlying
package/ stream - to ensure that no other thread is messing with our
underlying stream position while we are reading compressed data from it =)

	I guess we should also write a unit test for this - that is heavily
threaded - 4-8 threads doing back-to-back reads for quite some time
might prolly catch this.

	So - could you look into the package/ impl ?

	Thanks !

		Michael.

-- 
michael.meeks at collabora.com <><, Pseudo Engineer, itinerant idiot


More information about the LibreOffice mailing list