[poppler] recent defect with page.get_text

Albert Astals Cid aacid at kde.org
Mon Sep 26 05:26:14 PDT 2011


A Dissabte, 24 de setembre de 2011, alex bodnaru vàreu escriure:
> hello albert, friends,

Hi

> about the "recent" defect, it got to me since the *recent* debian upgrade of
> libpoppler* from 0.12.4 strait to 0.16.7.
> 
> after comparing the releases in between, i found the problem occured from
> 0.13.2 to 0.13.3. diff between these 2 is attached.

That is not recent at all, that is older than a year ;-)

> i see it's a great update. any advice would be welcome.

In that release there were the following commits that touched TextOutputdev, I 
do not know if you know how to compile from git but if you do it would be 
great if you could try going back to 
9c5612f6e013a8698eff6531ec388a7e6c1fb89a
db014ffb357e760d9397544c5a8fe747cdb497ab
b1d43fa052d9160c4f319a67415ecf3ebf2cf9b3
f83b677a8eb44d65698b77edb13a5c7de3a72c0f
a2191a4d45e0abaec97c19aacae37c4c5824bd36
345ed51af9b9e7ea53af42727b91ed68dcc52370
12d83931ae1b899b70c7ea5c01f03f123b1bb9a8

And compile for each of them and see in which of those the bug is present and 
in which of them is not.

Albert

P.S: You still send html email ;-)

> 
> just please look at the glib/demo/poppler-glib-demo get text output from the
> attached pdf, even of the fist page.
> 
> On 09/18/2011 05:09 PM, Albert Astals Cid wrote:
> 
> Please do not email me, email the list.
> 
> A Diumenge, 18 de setembre de 2011, vàreu escriure:
> On 09/18/2011 02:41 PM, Albert Astals Cid wrote:
> A Diumenge, 18 de setembre de
>       2011, alex bodnaru vàreu escriure:
> 
> hello friends,
> 
> Hi
> thanks a lot albert for considering my problem.
> I am not considering your problem, I am complaining about the lack of
> information in your original mail ;-)
> 
> i'm using poppler through python (that invokes glib interface).
> 
> a recent change (probably together with get_text separation) broke the glib
> interface.
> 
> what does recent mean? 0.16.7? 0.17.x? git master?
> 0.16.7.
> So 0.16.7 does not work, which is the version you know it works?
> 
> Albert
> 
> P.S: Would it be possible for you not to send HTML email?
> 
> Albert
> thanks again,
> alex
> 
> i can't load the entire page text with get_text (see the glib demo) of one
> pdf i have, but pdftotext does output the entire text.
> 
> my pdf is attached. i apology for the language, but i promise it's a non
> offending cadastre report. please see that not all text lines are being
> output by get_text.
> 
> could you help?
> 
> thanks in advance,
> 
> alex
> 
>       _______________________________________________
> 
>       > poppler mailing list
>       > 
>       > poppler at lists.freedesktop.org
>       > 
>       > http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list