[poppler] Characters with accents not correctly handled

Martin Schröder martin at oneiros.de
Sun Aug 19 16:14:49 PDT 2007


2007/8/19, Laurent Aguerreche <laurent.aguerreche at free.fr>:
> Le dimanche 19 août 2007 à 22:34 +0200, Martin Schröder a écrit :
> > It's a ligature. It's a feature. :-)
>
> :-/
>
> So with DéjàVu fonts and "ff" character, it looks rather ugly and this
> character is not displayed by emacs22 (just an empty rectangle).  \o/
>
> But the real problem is that it is impossible to recognize :
> - "fi" as "fi" too
> - "ff" as "ff" too
> Would it be possible to add a new parameter to pdftotext to make it
> ignore ligatures but still export in UTF-8?

pdftex can since 1.30.0 disable all ligatures for a font with
\pdfnoligatures. But this produces inferior typesetting and no, there
is no switch to disable ligatures for all fonts. But it should be easy
to convert "ff" to "ff" with the help of sed/awk/..., i.e. massaging
the output of pdftotext.

Best
   Martin


More information about the poppler mailing list