Saving documents with broken zip streams (Re: minutes of ESC call ...)

Jan-Marek Glogowski glogow at fbihome.de
Mon Jun 3 20:21:35 UTC 2019


Am 03.06.19 um 21:52 schrieb Luboš Luňák:
> On Monday 03 of June 2019, Caolán McNamara wrote:
>> On Mon, 2019-06-03 at 12:23 +0200, Luboš Luňák wrote:
>>> On Thursday 30 of May 2019, Michael Meeks wrote:
>>>>       + some in the zip area - assuming they are threading related.
>>>
>>>  Is this about those documents such
>>> as /srv/crashtestdata/files/caolan/opendocument_stack_overflow_2.odt
>>> ? How
>>> can I reproduce that problem? If I try to fetch
>>> buildslave at vm138.documentfoundation.org:
>>> an/opendocument_stack_overflow_2.odt ,
>>> it doesn't exist.
>>
>> I attach two of the examples here. The input name was foo.sample, the
>> output to odt name appears higher up in the bt during the export.
>>
>> ./instdir/program/soffice.bin --headless --convert-to odt
>> opendocument_stack_overflow.sample
> 
> 
>  Ok, so it's not a problem with my code, my changes just happened to show the 
> problem, and the problem is that those documents are broken. If you try to 
> unzip the documents, it will complain about incorrect CRC (although it still 
> will uncompress them). And what happens is that when we try to save the file, 
> apparently only by that point we'll read those zip streams, there will be a 
> ZipException about that, and the code in package/ is not exception-safe. So 
> ZipOutputStream::writeLOC() gets called but not the matching 
> ZipOutputStream::rawCloseEntry().
> 
>  But this is actually broken on several levels. If I make the code to catch 
> the exception better, I'll need to make it somehow handle the fact that 
> writeLOC() prepared for writing en entry, but then there's nothing to write. 
> But that's actually not important, since ZipPackageStream::saveChild() will 
> still return failure, so ZipPackageFolder::saveContents() will throw an 
> exception, making the whole document saving fail. Which in turn means this 
> whole save business is irrelevant, as there's just no way to save the 
> document, even though we can load it and we can edit it. Which seems rather 
> lame.
> 
>  Any idea what to do about that? Is it really ok that we just refuse to save 
> it? Or should we save it even though the contents may be broken?

IMHO the only sane solution would be to detect the broken CRCs on read and
report a broken file to the user. Eventually we could offer some recovery
option: mark the broken CRCs to be recalculated and keep the stuff or drop the
broken ZIP entries. I guess most users can't make this decision, so I would opt
for optional recovery of the entries, ignoring the CRCs.

Easier solution: we just deny / abort loading the file and tell the user babout
the broken file.

It really strange that the broken CRCs are just detected on write and the
document loads without a problem. Or do we ignore all ZIP entries, which we
don't know, which would be strange too?


More information about the LibreOffice mailing list