[poppler] Page::display function performance

Albert Astals Cid aacid at kde.org
Mon Mar 9 13:50:03 PDT 2009


A Dilluns, 9 de març de 2009, Ilya Gorenbein va escriure:
> Hello,
>
>
>
> I need to extract the text out of the document/page.
>
> I tried a void Page::display(OutputDev *out, double hDPI, double vDPI,
>
>                   int rotate, GBool useMediaBox, GBool crop,
>
>                    GBool printing, Catalog *catalog,
>
>                    GBool (*abortCheckCbk)(void *data),
>
>                    void *abortCheckCbkData,
>
>                    GBool (*annotDisplayDecideCbk)(Annot *annot, void
> *user_data),
>
>                    void *annotDisplayDecideCbkData) ;
>
>
>
> function (poppler version 0.10.4). When I measured performance of this
> function, I've got ~1.5 Mb/sec on dual core 2.33GHz CPU, 2 Gb of RAM,
> with kernel 2.6.24-17, Debian lenny distro.

Hope you are using a TextOutputDev there and not a renderer like Splash or 
Cairo.

>
> Please, advice me how the performance of this function could be
> improved.

You get a profiler like callgrind and send us patches that for the hot spots 
of the code.

> Is there another (cheaper) way to extract text out of the
> document/page.

I would say not, that's what pdftotext uses.

Albert



More information about the poppler mailing list