[poppler] pdftops -optimizecolorspace and ImageStream::reset()

William Bader williambader at hotmail.com
Fri Sep 11 18:09:07 UTC 2020


>Everything should be able to reset + re.read, i mean it's just data on disk/memory so i don't see why it wouldn't work (without bugs).

The image is in-line, and Gfx::buildImageStream() creates an EmbedStream() with EmbedStream(parser->getStream(), std::move(dict), false, 0, true);
EmbedStream() doesn't implement its own reset().
When I tried creating an EmbedStream::reset() that called str->reset() and cleared record and replay, when DCTStream::DCTStream() eventually gets the stream after the reset before the second pass for optimizecolorspace, it starts back at the beginning of the object instead of where the in-line DCT image starts. It eventually finds the image (which is better than before where the read position remained at the end of the image and the second pass found nothing), but it doesn't seem like a good solution.
EmbedStream::getStart(), setPos(), and moveStart() all print messages like error(errInternal, -1, "Internal: called getStart() on EmbedStream");
Are you sure that EmbedStream() can be reset in this context without messing up the parser?
I have been using -optimizecolorspace for about 5 years, and this is the first time that it didn't work, so if in-line images can't be reset without messing up the parser, I could have PSOutputDev::doImageL1Sep() skip the scan if it can detect that the image stream an EmbedStream.
William


________________________________
From: poppler <poppler-bounces at lists.freedesktop.org> on behalf of Albert Astals Cid <aacid at kde.org>
Sent: Friday, September 11, 2020 5:57 AM
To: poppler at lists.freedesktop.org <poppler at lists.freedesktop.org>
Subject: Re: [poppler] pdftops -optimizecolorspace and ImageStream::reset()

El dijous, 10 de setembre de 2020, a les 23:00:57 CEST, William Bader va escriure:
> I have a PDF where 'pdftops -level1sep -optimizecolorspace' gets 'Syntax Error: Could not find start of jpeg data' and drops part of the image.
> The problem happens in PSOutputDev::doImageL1Sep() where it prescans the image by making a new ImageStream(str, width, colorMap->getNumPixelComps(), colorMap->getBits()).
> When I made the patch to add -optimizecolorspace, I had first tried scanning the original stream and then using imgStr->reset(), but it didn't work for some types of streams, so I switched creating a new stream, which is the code currently in poppler.
> But even that doesn't seem to work for DCTStream.
>
> Is the problem that some types of streams can never be reread (which means that -optimizecolorspace can't work as written) or that rereading streams isn't well tested and I might be able to fix it by reviewing the initialization?

Everyting should be able to reset + re.read, i mean it's just data on disk/memory so i don't see why it wouldn't work (without bugs).

Cheers,
  Albert

>
> I can post a bug report, including the PDF, which is only 350KB, if it would help.
>
> Thanks, William
>




_______________________________________________
poppler mailing list
poppler at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20200911/054b8f02/attachment.htm>


More information about the poppler mailing list