[Poppler-bugs] [Bug 16939] New: man page wrong about default text encoding for pdftotext
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Aug 1 04:37:18 PDT 2008
http://bugs.freedesktop.org/show_bug.cgi?id=16939
Summary: man page wrong about default text encoding for pdftotext
Product: poppler
Version: unspecified
Platform: Other
OS/Version: All
Status: NEW
Severity: normal
Priority: medium
Component: general
AssignedTo: poppler-bugs at lists.freedesktop.org
ReportedBy: seb128 at ubuntu.com
the bug has been opened on
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/251002
"The man page for pdftotext(1) says -enc defaults to Latin1, but my testing
shows that I get identical output with no -enc and with -enc UTF-8. -enc Latin
1 gives different output. I'm using a French PDF, and viewing the text with
less(1). In an LANG=en_CA xterm, the -enc Latin1 text looks right. In a
LANG=en_CA.utf8 gnome-terminal, the default/-enc UTF-8 output looks right. When
it's mismatched, you see an inverse-video question-mark sort of glyph, or
less's highlighting of control characters, depending on what locale less is
using.
xpdfrc(5) says the default for textEncoding is Latin1. pdftotext(1) says this
config option corresponds to -enc.
Anyway, UTF-8 output seems to work properly, it's just the documentation that
says it's not the default."
--
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the Poppler-bugs
mailing list