[Poppler-bugs] [Bug 26077] Mis-placed ActualText strings

James Cloos cloos at jhcloos.com
Tue Jul 9 13:02:15 PDT 2013


Lualatex creates a better pdf than xelatex does from that .tex file
in that all of evince, pdftotext and mupdf¹ extract exactly as they
should.

Acro extracts the correct text from the xelatex-generated pdf.

The significant difference between the xelatex and lualatex output is
that xe encases each ActualText Span in «q cm ... Q».

That is, lua’s output looks like:

BT
/F21 11.95517 Tf 1 0 0 1 128.413 654.247 Tm [<001C>]TJ
ET
/Span<</ActualText(\376\377\000b)>>BDC

and xe’s like:

BT
/F1 11.955 Tf 56.41 -65.75 Td[<001c>]TJ
ET
q
1 0 0 1 62.27 -65.75 cm
/Span<</ActualText(\376\377\000b)>>BDC
Q

Td vs Tm also may come into play.

————————————————
1] mu, though, lacks any support for ActualText.


More information about the Poppler-bugs mailing list