Saving documents with broken zip streams (Re: minutes of ESC call ...)
glogow at fbihome.de
Mon Jun 3 20:21:35 UTC 2019
Am 03.06.19 um 21:52 schrieb Luboš Luňák:
> On Monday 03 of June 2019, Caolán McNamara wrote:
>> On Mon, 2019-06-03 at 12:23 +0200, Luboš Luňák wrote:
>>> On Thursday 30 of May 2019, Michael Meeks wrote:
>>>> + some in the zip area - assuming they are threading related.
>>> Is this about those documents such
>>> as /srv/crashtestdata/files/caolan/opendocument_stack_overflow_2.odt
>>> ? How
>>> can I reproduce that problem? If I try to fetch
>>> buildslave at vm138.documentfoundation.org:
>>> an/opendocument_stack_overflow_2.odt ,
>>> it doesn't exist.
>> I attach two of the examples here. The input name was foo.sample, the
>> output to odt name appears higher up in the bt during the export.
>> ./instdir/program/soffice.bin --headless --convert-to odt
> Ok, so it's not a problem with my code, my changes just happened to show the
> problem, and the problem is that those documents are broken. If you try to
> unzip the documents, it will complain about incorrect CRC (although it still
> will uncompress them). And what happens is that when we try to save the file,
> apparently only by that point we'll read those zip streams, there will be a
> ZipException about that, and the code in package/ is not exception-safe. So
> ZipOutputStream::writeLOC() gets called but not the matching
> But this is actually broken on several levels. If I make the code to catch
> the exception better, I'll need to make it somehow handle the fact that
> writeLOC() prepared for writing en entry, but then there's nothing to write.
> But that's actually not important, since ZipPackageStream::saveChild() will
> still return failure, so ZipPackageFolder::saveContents() will throw an
> exception, making the whole document saving fail. Which in turn means this
> whole save business is irrelevant, as there's just no way to save the
> document, even though we can load it and we can edit it. Which seems rather
> Any idea what to do about that? Is it really ok that we just refuse to save
> it? Or should we save it even though the contents may be broken?
IMHO the only sane solution would be to detect the broken CRCs on read and
report a broken file to the user. Eventually we could offer some recovery
option: mark the broken CRCs to be recalculated and keep the stuff or drop the
broken ZIP entries. I guess most users can't make this decision, so I would opt
for optional recovery of the entries, ignoring the CRCs.
Easier solution: we just deny / abort loading the file and tell the user babout
the broken file.
It really strange that the broken CRCs are just detected on write and the
document loads without a problem. Or do we ignore all ZIP entries, which we
don't know, which would be strange too?
More information about the LibreOffice