[poppler] patch for a progress callback

Leonard Rosenthol lrosenth at adobe.com
Mon Jan 7 04:54:22 PST 2013


Yes, in the case of that "stroke" PDF each line segment has been put into
a separate Form Xobject which adds TREMENDOUS overhead to rendering.  If
you view it in Adobe Acrobat/Reader, you will see that we end up going
into "progressive mode" where each line is rendered individually - cool to
look at but VERY painful.

I would also like to remind you that Adobe Illustrator is _NOT_ a PDF
editor.  So opening up PDFs (that were not originally created by
Illustrator) and modifying them is not the intent of the product and not
recommend by Adobe.

Leonard

On 1/7/13 5:39 AM, "lists at ds.com" <lists at damianstewart.com> wrote:

>
>On 06 Jan 2013, at 21:01, poppler-request at lists.freedesktop.org wrote:
>> From: Albert Astals Cid <aacid at kde.org>
>> Subject: Re: [poppler] patch for a progress callback
>> 
>> El Divendres, 4 de gener de 2013, a les 09:16:25, Damian Stewart va
>>escriure:
>> Some comments:
>> * The cpp facing api seems quite "un-C++-ish", i think that a class
>>with a 
>> pure virtual progressDone() function and a Page::setProgressHandler
>>method 
>> would make it more C++ish
>
>In the core Poppler source, the existing callbacks follow the same
>pattern:
>
>void display(OutputDev *out, double hDPI, double vDPI,
>...
>GBool (*abortCheckCbk)(void *data) = NULL,
>		  void *abortCheckCbkData = NULL,
>		  GBool (*annotDisplayDecideCbk)(Annot *annot, void *user_data) = NULL,
>		  void *annotDisplayDecideCbkData = NULL,
>
>With this precedent, I would have assumed
>
>		  void (*progressCbk)(int pageNum, float progressPct, void* user_data)
>= NULL,
>		  void *progressCbkData = NULL );
>
>makes the most sense.. or?
>
>
>> * How useful is this? I.e. i predict that there are operations that are
>> really fast and others really slow, does it really help?
>
>With UX improvements, often 'useful' is not the adjective to use. Try
>'less frustrating'.
>
>Yes, some operations are slow and some are fast, but having a progress
>indicator that gradually increases from 0 to 1 over 2 hours is much, much
>less frustrating than a blinking cursor for the same 2 hours, especially
>if you don't know up front whether it's going to take 1 minute, 20
>minutes or 2 hours.
>
>
>> * The DummyOutputDev should take the real outputdev and return the
>>values of 
>> the real outputdev for the "querying" functions
>
>Good point, yes, it should probably do that.
>
>
>> * Have you measured the performance hit?
>
>I have not, however:
>0. Any progress indicator naturally involves reduced 'performance'
>(because of GUI redraw calls, mutex grab/release, etc). However from a UX
>perspective, 10 seconds with no progress indicator is way more
>frustrating for the user than 10.1 seconds with a progress indicator.
>1. If progressCbk==NULL in Page::displaySlice then there is performance
>hit as the progress code is never called.
>2. If the page has few operations then the performance hit through
>precalculating operation count will be miniscule.
>3. If the page has many operations then the time to perform those
>operations 'for real' will dwarf the time to count operations probably by
>several orders of magnitude.
>4. The time taken hit to call the progressCbk per operation will depend
>entirely on the third-party's implementation of the progressCbk; ie, it's
>not Poppler's responsibility.
>
>
>
>
>On 06/01/13 18:35, Adrian Johnson <ajohnson at redneon.com> wrote:
>
>> This is the biggest problem I see with this patch. There are PDFs with
>> one huge image. In this case the progress will sit on 0% for a long time
>> then jump 100% when rendering is finished.
>
>Granted. However it's unlikely that a pdf with embedded images will be
>rendered at 300DPI, as that would mean that the embedded images would
>also need to be 300DPI; and if someone is emailing you a PDF with 300DPI
>images you can email them back and say 'please just send me the images'.
>
>Also: Have you seen how other progress indicators out there function?
>This is a universal issue with progress indicators.
>
>> It also doesn't take into
>> account XObjects and patterns which may also take a long time to render
>> with no progress available. There are PDFs with a single XObject painted
>> in the content stream. All the work is done in the XObject.
>> 
>> If we merge this feature what do we do when users start filing bugs
>> about these cases that don't provide useful progress feedback?
>
>I don't know. I'm also not sure what you're saying. So no-one should have
>a progress bar, because the maintainers don't want to deal with users
>complaining about the edge cases?
>
>>> * The DummyOutputDev should take the real outputdev and return the
>>>values of 
>>> the real outputdev for the "querying" functions, e.g. your dummy
>>>outpudev 
>>> returns false for useTilingPatternFill, but SplashOutputDev returns
>>>true, this 
>>> probably means that you never get to 100% on files that have patterns,
>>>no?
>>> * Have you measured the performance hit?
>> 
>> I would also like to know the performance cost of rendering to
>> DummyOutputDev.
>
>This point was addressed above.
>
>> Have you got a sample PDF we can test with? It would be interesting to
>> see if there is anything that can be done to speed up the rendering.
>
>https://dl.dropbox.com/u/16106653/pdfToImage-progress-example.zip
>
>Note: the PDFs in this zip will probably take a very, very long time to
>render with standard desktop PDF viewers. The first one
>2012_12_28_13_6_22.pdf (generated by Cairo) is reasonably well behaved.
>The second one 2012_12_28_13_6_22_0,5stroke-1.pdf (generated by loading
>2012_12_28_13_6_22.pdf into Adobe Illustrator and modifying the stroke
>width for all of the lines) seems to have multiple embedded streams, but
>is typical of the documents I was working with in making this patch.
>
>After compiling pdfToImage-progress.cpp, try:
> ./a.out -i 2012_12_28_13_6_22.pdf -o test.png --width 12000
>(12000 pixels is the image size necessary to produce a ~100cm wide print
>at 300DPI)
>(This is a direct output from an openFrameworks application that draws
>several thousand line segments every frame, running at 60fps for a minute
>or two)
>
>For a poor performance example, try
> ./a.out -i 2012_12_28_13_6_22_0,5stroke-1.pdf -o test.png --width 3508
>(3508 pixels wide is a DIN-A4 print at 300DPI)
>(This is generated from the same data file as above, after loading it
>into Adobe Illustrator, selecting all the lines and setting their stroke
>width to 0.5 -- yes, there are better ways to achieve this than via a GUI
>application, but this is the use case I am working from.)
>
>
>> I don't think calling the progress callback on every operation is
>> efficient or necessary. It should be sufficient to call the callback
>> every n operations. I suggest making n = TotalOperations/100.
>
>Try with the 2012_12_28_13_6_22_0,5stroke-1.pdf example at 300dpi, as in
>the paragraph above. With this one, some single operations can take
>several seconds on my (2GHz Intel) machine. If n=TotalOperations/100, the
>progress bar would stall on this for tens of seconds to a minute, which
>is annoying. IMO the implementor of the callback function should be able
>to decide how frequently they want to respond to events.
>
>> Instead of parsing the content twice to get the operation count you
>> could use the current position in the content stream to report progress.
>> You would have to use the compressed stream position
>> (getBaseStream()->getLength()) since the uncompressed length of a stream
>> is not stored in the pdf file.
>
>I initially attempted that. With the 2012_12_28_13_6_22_0,5stroke-1.pdf
>from the zip linked above, it seems there are multiple streams embedded
>in the PDF, as the stream position pointer keeps on jumping back to zero.
>I'm assuming there are multiple streams, which means you'd need to do a
>preprocessing step which involves looping through the PDF, adding up all
>the stream lengths, and then actually rendering .... which is exactly
>what I've done with this patch, but counting operations rather than
>stream lengths.
>
>Cheers
>Damian
>
>--
>damian stewart . interdisciplinary creative engineer . vienna, austria
>http://damianstewart.com . twitter @damian0815
>
>
>
>
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list