[poppler] Analysing 3 pages at a time with one function

Josh Richardson jric at chegg.com
Mon Oct 24 20:08:59 PDT 2011


That comes back to what you're trying to do.  Since you haven't answered
that question, I'll make a wild guess that you'll need to derive from the
SplashOutputDev and define overrides for functions like startPage() and/or
endPage() and maybe others.  Note that the only record SplashOutputDev
keeps of the document processing is a bitmap rendering.  If you use or
override SplashOutputDevHtmlImages or HtmlOutputDev, you'll have access to
a little more meta-data such as position of elements or font list, etc.

Also note that with this approach, you're generating each page of the PDF
three times, which is likely much less efficient than doing a second pass:
reading and operating on XML files, but don't take my word for it -- try
it for yourself.

--josh

On 10/24/11 6:52 PM, "Alec Taylor" <alec.taylor6 at gmail.com> wrote:

>Using the second-pass approach is still a possibility, I just wanted
>to see if I could utilise a one pass approach.
>
>Do you have any ideas how I can get a function like this working?
>
>On Tue, Oct 25, 2011 at 9:21 AM, Josh Richardson <jric at chegg.com> wrote:
>> What are you trying to do?  What's your algorithm?  Did you decide
>>against
>> the second-pass approach on the XML files?
>>
>> --josh
>>
>> On 10/24/11 1:57 PM, "Alec Taylor" <alec.taylor6 at gmail.com> wrote:
>>
>>>Good morning,
>>>
>>>How do I analyse 3 pages at a time, using the one function?
>>>
>>>Thanks for all suggestions,
>>>
>>>Alec Taylor
>>>
>>>FYI: You'll find my attempt below
>>>
>>>[utils/pdftohtml.cc]
>>>
>>><snipped>
>>>
>>>GBool analyseThree(PDFDoc*, SplashOutputDev*, int, int, int);
>>>
>>>int main(int argc, char *argv[]) {
>>>
>>><snipped>
>>>
>>>      for (int pg = firstPage; pg <= lastPage; ++pg) {
>>>        if(!generateHeaderFooter)
>>>        doc->displayPage(splashOut, pg,
>>>                         72 * scale, 72 * scale,
>>>                         0, gTrue, gFalse, gFalse);
>>>        else
>>>            if(pg+1<lastPage && pg+2<lastPage)
>>>                analyseThree(*doc, *splashOut, pg, pg+1, pg+2);
>>>
>>>        SplashBitmap *bitmap = splashOut->getBitmap();
>>>
>>>        imgFileName = GooString::format("{0:s}{1:03d}.{2:s}",
>>>            htmlFileName->getCString(), pg, extension);
>>>
>>>        bitmap->writeImgFile(format, imgFileName->getCString(),
>>>                             72 * scale, 72 * scale);
>>>
>>>        delete imgFileName;
>>>      }
>>>
>>><snipped>
>>>
>>>GBool analyseThree(PDFDoc *doc, SplashOutputDev *splashOut, int first,
>>>int second, int third) {
>>>    doc->displayPage(splashOut, first, 72 * scale, 72 * scale, 0,
>>>gTrue, gFalse, gFalse);
>>>    doc->displayPage(splashOut, second, 72 * scale, 72 * scale, 0,
>>>gTrue, gFalse, gFalse);
>>>    doc->displayPage(splashOut, third, 72 * scale, 72 * scale, 0,
>>>gTrue, gFalse, gFalse);
>>>
>>>    return gTrue;
>>>}
>>>_______________________________________________
>>>poppler mailing list
>>>poppler at lists.freedesktop.org
>>>http://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>
>>
>



More information about the poppler mailing list