[poppler] Poppler 0.8.2 released

Ross Moore ross at ics.mq.edu.au
Tue Apr 29 17:10:35 PDT 2008


Hello Albert,

On 30/04/2008, at 7:56 AM, Albert Astals Cid wrote:
> Available from
> http://poppler.freedesktop.org/poppler-0.8.2.tar.gz

> Testing, patches and bug reports welcome.

I joined this list recently, to see whether the Poppler versions
of the  Xpdf  utilities worked any differently from the non-Poppler
versions.

I'm working on a Mac, with MacOS X v10.4.11, and have successfully
built the utilities from this latest release.


All of  pdfinfo, pdffonts, pdftohtml, pdftotext, pdftops, pdftoppm
and  pdfimages  work fine on a simple 1-page PDF that I created
with pdfTeX.

However, all of these fail with a  "Bus error" on more
complicated multi-page PDFs, which you can find here:

   http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf
   http://www.maths.mq.edu.au/~ross/5019-e-mmap.pdf


I'm particularly interested in  pdffonts, pdftohtml, pdftotext
as I want a free tool to be able to correctly extract the text
from documents such as the above PDFs.

They must extract the *complete* textual contents, using the
  CMap font-encoding resources that these PDFs contain.


Non-poppler versions of the utilities;  e.g.

   rossmoor% pdftotext -v
   pdftotext version 3.02
   Copyright 1996-2007 Glyph & Cog, LLC

work to some extent, but certainly not completely.
  (pdfimages  works but the output is incomplete and useless
   and  pdftoppm  also gives a  Bus error .)


For example, this is part of the text extracted from  5019-e-mmap.pdf
using  pdftotext (v3.02)

   Figure 1: The Moebius strip. Consider the two-sheeted covering
    \pi  : \BbbS 2 \rightar P and the inverse image \pi   - 1 (L)
   of one of these circles.

It's pretty good, except that \rightarrow  has been truncated
to 8 characters.  There are many similar instances within the
full text.  However, the Poppler version doesn't get far enough
through the document to see this --- at least not for me.


BTW, the text selection in Adobe Reader (versions 7.* & 8.*)
does extract the text more completely; so there is either
a bug or a design flaw within the  pdftotext  utility.


>
> Albert


Hope this helps,
and that you can help me.


Cheers,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross at maths.mq.edu.au
Mathematics Department                           office: E7A-419
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------





More information about the poppler mailing list