[Poppler-bugs] [Bug 26077] Mis-placed ActualText strings
James Cloos
cloos at jhcloos.com
Tue Jul 9 13:02:15 PDT 2013
Lualatex creates a better pdf than xelatex does from that .tex file
in that all of evince, pdftotext and mupdf¹ extract exactly as they
should.
Acro extracts the correct text from the xelatex-generated pdf.
The significant difference between the xelatex and lualatex output is
that xe encases each ActualText Span in «q cm ... Q».
That is, lua’s output looks like:
BT
/F21 11.95517 Tf 1 0 0 1 128.413 654.247 Tm [<001C>]TJ
ET
/Span<</ActualText(\376\377\000b)>>BDC
and xe’s like:
BT
/F1 11.955 Tf 56.41 -65.75 Td[<001c>]TJ
ET
q
1 0 0 1 62.27 -65.75 cm
/Span<</ActualText(\376\377\000b)>>BDC
Q
Td vs Tm also may come into play.
————————————————
1] mu, though, lacks any support for ActualText.
More information about the Poppler-bugs
mailing list