[Poppler-bugs] [Bug 16939] New: man page wrong about default text encoding for pdftotext

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Aug 1 04:37:18 PDT 2008


http://bugs.freedesktop.org/show_bug.cgi?id=16939

           Summary: man page wrong about default text encoding for pdftotext
           Product: poppler
           Version: unspecified
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: general
        AssignedTo: poppler-bugs at lists.freedesktop.org
        ReportedBy: seb128 at ubuntu.com


the bug has been opened on
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/251002

"The man page for pdftotext(1) says -enc defaults to Latin1, but my testing
shows that I get identical output with no -enc and with -enc UTF-8. -enc Latin
1 gives different output. I'm using a French PDF, and viewing the text with
less(1). In an LANG=en_CA xterm, the -enc Latin1 text looks right. In a
LANG=en_CA.utf8 gnome-terminal, the default/-enc UTF-8 output looks right. When
it's mismatched, you see an inverse-video question-mark sort of glyph, or
less's highlighting of control characters, depending on what locale less is
using.

xpdfrc(5) says the default for textEncoding is Latin1. pdftotext(1) says this
config option corresponds to -enc.

 Anyway, UTF-8 output seems to work properly, it's just the documentation that
says it's not the default."


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list