[Poppler-bugs] [Bug 102760] pdftops generates crumbled text

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Sep 18 03:53:07 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=102760

--- Comment #7 from William Bader <williambader at hotmail.com> ---
When I run
  pdffonts -f 1 -l 1 bug.pdf
it lists ArialBold twice.
The first copy has 19 glyphs, and the second copy has 15 glyphs.
The missing characters are glyphs in positions 16 to 19 of the first copy
because after reading the second copy, it thinks that the font has only 15
glyphs and avoids accessing positions 16 to 19.

The question is how to handle it.
In my test files, the last defined glyph is under 200, the referenced glyphs
are in the 1000's, and each font is in the pdf only once.

1. When I look at the PDF, object 59 starts /BaseFont/ArialBold/ and objects 57
and 58 start /DescendantFonts ... /BaseFont/ArialBold, so maybe there is a way
to differentiate the copy with 19 glyphs from the copy with 15 glyphs, although
I think that information is lost by the time that it gets into
PSOutputDev::drawString(), which is why I had create the hash and couldn't just
add a maxGlyphs field to GfxFont.

2. I could change the code in PSOutputDev::setupExternalCIDTrueTypeFont() and
PSOutputDev::setupEmbeddedCIDTrueTypeFont() so that when it sees a font for the
second time, instead of always updating the hash mapping font names to glyph
counts, it could update the glyph count only if the new number is larger. That
should fix this file without breaking my test files. If this is OK, I can
submit a patch within a day or two.

William

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20170918/b11a242b/attachment.html>


More information about the Poppler-bugs mailing list