[poppler] Poppler 0.8.2 released
Albert Astals Cid
aacid at kde.org
Wed Apr 30 10:51:47 PDT 2008
A Dimecres 30 Abril 2008, Ross Moore va escriure:
> Hello Albert,
>
> On 30/04/2008, at 7:56 AM, Albert Astals Cid wrote:
> > Available from
> > http://poppler.freedesktop.org/poppler-0.8.2.tar.gz
> >
> > Testing, patches and bug reports welcome.
>
> I joined this list recently, to see whether the Poppler versions
> of the Xpdf utilities worked any differently from the non-Poppler
> versions.
>
> I'm working on a Mac, with MacOS X v10.4.11, and have successfully
> built the utilities from this latest release.
>
>
> All of pdfinfo, pdffonts, pdftohtml, pdftotext, pdftops, pdftoppm
> and pdfimages work fine on a simple 1-page PDF that I created
> with pdfTeX.
>
> However, all of these fail with a "Bus error" on more
> complicated multi-page PDFs, which you can find here:
>
> http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf
> http://www.maths.mq.edu.au/~ross/5019-e-mmap.pdf
This is due to a problem in how Annotations are handled, i found a possible
way to find the problem but i'd like to ask you a question. In the PDF code i
can see some buttons with texts like "Shift-click image then move mouse to
shift image; click again (no Shift) anchors at destination"
Can you see these buttons with Acrobat? I tried to look at them but could not
find them.
>
> I'm particularly interested in pdffonts, pdftohtml, pdftotext
> as I want a free tool to be able to correctly extract the text
> from documents such as the above PDFs.
>
> They must extract the *complete* textual contents, using the
> CMap font-encoding resources that these PDFs contain.
>
>
> Non-poppler versions of the utilities; e.g.
>
> rossmoor% pdftotext -v
> pdftotext version 3.02
> Copyright 1996-2007 Glyph & Cog, LLC
>
> work to some extent, but certainly not completely.
> (pdfimages works but the output is incomplete and useless
> and pdftoppm also gives a Bus error .)
>
>
> For example, this is part of the text extracted from 5019-e-mmap.pdf
> using pdftotext (v3.02)
>
> Figure 1: The Moebius strip. Consider the two-sheeted covering
> \pi : \BbbS 2 \rightar P and the inverse image \pi - 1 (L)
> of one of these circles.
>
> It's pretty good, except that \rightarrow has been truncated
> to 8 characters. There are many similar instances within the
> full text. However, the Poppler version doesn't get far enough
> through the document to see this --- at least not for me.
This would be a separate bug.
Let's sort the first one first.
Albert
>
>
> BTW, the text selection in Adobe Reader (versions 7.* & 8.*)
> does extract the text more completely; so there is either
> a bug or a design flaw within the pdftotext utility.
>
> > Albert
>
> Hope this helps,
> and that you can help me.
>
>
> Cheers,
>
> Ross
>
> ------------------------------------------------------------------------
> Ross Moore ross at maths.mq.edu.au
> Mathematics Department office: E7A-419
> Macquarie University tel: +61 (0)2 9850 8955
> Sydney, Australia 2109 fax: +61 (0)2 9850 8114
> ------------------------------------------------------------------------
More information about the poppler
mailing list