# [poppler] Poppler 0.8.2 released

> > Available from
> > http://poppler.freedesktop.org/poppler-0.8.2.tar.gz
> > Testing, patches and bug reports welcome.
> I joined this list recently, to see whether the Poppler versions
> of the  Xpdf  utilities worked any differently from the non-Poppler
> versions.
> I'm working on a Mac, with MacOS X v10.4.11, and have successfully
> built the utilities from this latest release.
> All of  pdfinfo, pdffonts, pdftohtml, pdftotext, pdftops, pdftoppm
> and  pdfimages  work fine on a simple 1-page PDF that I created
> with pdfTeX.
> However, all of these fail with a  "Bus error" on more
> complicated multi-page PDFs, which you can find here:
>    http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf
>    http://www.maths.mq.edu.au/~ross/5019-e-mmap.pdf

This is due to a problem in how Annotations are handled, i found a possible
way to find the problem but i'd like to ask you a question. In the PDF code i
can see some buttons with texts like "Shift-click image then move mouse to
shift image; click again (no Shift) anchors at destination"

Can you see these buttons with Acrobat? I tried to look at them but could not
find them.

> I'm particularly interested in  pdffonts, pdftohtml, pdftotext
> as I want a free tool to be able to correctly extract the text
> from documents such as the above PDFs.
> They must extract the *complete* textual contents, using the
>   CMap font-encoding resources that these PDFs contain.
> Non-poppler versions of the utilities;  e.g.
>
>    rossmoor% pdftotext -v
>    pdftotext version 3.02
>    Copyright 1996-2007 Glyph & Cog, LLC
> work to some extent, but certainly not completely.
>   (pdfimages  works but the output is incomplete and useless
>    and  pdftoppm  also gives a  Bus error .)
> For example, this is part of the text extracted from  5019-e-mmap.pdf
> using  pdftotext (v3.02)
>    Figure 1: The Moebius strip. Consider the two-sheeted covering
>     \pi  : \BbbS 2 \rightar P and the inverse image \pi   - 1 (L)
>    of one of these circles.
> It's pretty good, except that \rightarrow  has been truncated
> to 8 characters.  There are many similar instances within the
> full text.  However, the Poppler version doesn't get far enough
> through the document to see this --- at least not for me.

This would be a separate bug.

Let's sort the first one first.

Albert

> BTW, the text selection in Adobe Reader (versions 7.* & 8.*)
> does extract the text more completely; so there is either
> a bug or a design flaw within the  pdftotext  utility.
> > Albert
> Hope this helps,
> and that you can help me.
>
> Cheers,
>
> 	Ross
