# [poppler] Poppler 0.8.2 released

Albert Astals Cid aacid at kde.org
Wed Apr 30 10:51:47 PDT 2008

A Dimecres 30 Abril 2008, Ross Moore va escriure:
> Hello Albert,
>
> On 30/04/2008, at 7:56 AM, Albert Astals Cid wrote:
> > Available from
> > http://poppler.freedesktop.org/poppler-0.8.2.tar.gz
> >
> > Testing, patches and bug reports welcome.
>
> I joined this list recently, to see whether the Poppler versions
> of the  Xpdf  utilities worked any differently from the non-Poppler
> versions.
>
> I'm working on a Mac, with MacOS X v10.4.11, and have successfully
> built the utilities from this latest release.
>
>
> All of  pdfinfo, pdffonts, pdftohtml, pdftotext, pdftops, pdftoppm
> and  pdfimages  work fine on a simple 1-page PDF that I created
> with pdfTeX.
>
> However, all of these fail with a  "Bus error" on more
> complicated multi-page PDFs, which you can find here:
>
>    http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf
>    http://www.maths.mq.edu.au/~ross/5019-e-mmap.pdf

This is due to a problem in how Annotations are handled, i found a possible
way to find the problem but i'd like to ask you a question. In the PDF code i
can see some buttons with texts like "Shift-click image then move mouse to
shift image; click again (no Shift) anchors at destination"

Can you see these buttons with Acrobat? I tried to look at them but could not
find them.

>
> I'm particularly interested in  pdffonts, pdftohtml, pdftotext
> as I want a free tool to be able to correctly extract the text
> from documents such as the above PDFs.
>
> They must extract the *complete* textual contents, using the
>   CMap font-encoding resources that these PDFs contain.
>
>
> Non-poppler versions of the utilities;  e.g.
>
>    rossmoor% pdftotext -v
>    pdftotext version 3.02
>    Copyright 1996-2007 Glyph & Cog, LLC
>
> work to some extent, but certainly not completely.
>   (pdfimages  works but the output is incomplete and useless
>    and  pdftoppm  also gives a  Bus error .)
>
>
> For example, this is part of the text extracted from  5019-e-mmap.pdf
> using  pdftotext (v3.02)
>
>    Figure 1: The Moebius strip. Consider the two-sheeted covering
>     \pi  : \BbbS 2 \rightar P and the inverse image \pi   - 1 (L)
>    of one of these circles.
>
> It's pretty good, except that \rightarrow  has been truncated
> to 8 characters.  There are many similar instances within the
> full text.  However, the Poppler version doesn't get far enough
> through the document to see this --- at least not for me.

This would be a separate bug.

Let's sort the first one first.

Albert

>
>
> BTW, the text selection in Adobe Reader (versions 7.* & 8.*)
> does extract the text more completely; so there is either
> a bug or a design flaw within the  pdftotext  utility.
>
> > Albert
>
> Hope this helps,
> and that you can help me.
>
>
> Cheers,
>
> 	Ross
>
> ------------------------------------------------------------------------
> Ross Moore                                       ross at maths.mq.edu.au
> Mathematics Department                           office: E7A-419
> Macquarie University                             tel: +61 (0)2 9850 8955
> Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
> ------------------------------------------------------------------------