[poppler] [PATCH] Drop chars in TextOutputDev with no unicode CMap
Albert Astals Cid
aacid at kde.org
Thu May 28 01:24:48 PDT 2015
Please use bugzilla for patches, it's much easier to track them there than in
the mailing list.
Cheers,
Albert
El Dimecres, 27 de maig de 2015, a les 22:12:39, Peter Waller va escriure:
> poppler/Gfx.cc | 6 ++++++
> poppler/OutputDev.h | 4 ++++
> poppler/TextOutputDev.h | 4 ++++
> 3 files changed, 14 insertions(+)
>
> New commits:
> Author: Peter Waller <p at pwaller.net>
> Date: Wed, 27 May 2015 22:02:28 +0100
>
> If the font has no unicode cmap, it's not possible to output text for
> that encoding, so rather than potentially corrupting the textual
> output, the characters are dropped. These characters are kept
> for rendering.
>
> It may be possible to keep the characters for text output if they
> happen to lie in the set of printable characters, but my first priority
> is to fix crashes where the glib API returns an inconsistent number
> of glyphs via poppler_page_get_text and
> poppler_page_get_text_layout.
>
> Original bug: https://bugs.freedesktop.org/show_bug.cgi?id=73885
>
> diff --git a/poppler/Gfx.cc b/poppler/Gfx.cc
> index 07d95b3..130363d 100644
> --- a/poppler/Gfx.cc
> +++ b/poppler/Gfx.cc
> @@ -3934,6 +3934,12 @@ void Gfx::doShowText(GooString *s) {
> int len, n, uLen, nChars, nSpaces, i;
>
> font = state->getFont();
> +
> + if (out->needUnicodeText() && !font->hasToUnicodeCMap()) {
> + // No conversion to unicode available, drop characters.
> + return;
> + }
> +
> wMode = font->getWMode();
>
> if (out->useDrawChar()) {
> diff --git a/poppler/OutputDev.h b/poppler/OutputDev.h
> index e8a7a47..7e63739 100644
> --- a/poppler/OutputDev.h
> +++ b/poppler/OutputDev.h
> @@ -116,6 +116,10 @@ public:
> // Does this device need non-text content?
> virtual GBool needNonText() { return gTrue; }
>
> + // Does this device expect valid UTF-8 text? (i.e, discard characters for
> + // which cannot determine UTF-8 equivalents due to a missing unicode
> mapping) + virtual GBool needUnicodeText() { return gFalse; }
> +
> // Does this device require incCharCount to be called for text on
> // non-shown layers?
> virtual GBool needCharCount() { return gFalse; }
> diff --git a/poppler/TextOutputDev.h b/poppler/TextOutputDev.h
> index a0aa6f8..8bbd018 100644
> --- a/poppler/TextOutputDev.h
> +++ b/poppler/TextOutputDev.h
> @@ -762,6 +762,10 @@ public:
> // Does this device need non-text content?
> virtual GBool needNonText() { return gFalse; }
>
> + // Does this device expect valid UTF-8 text? (i.e, discard characters for
> + // which cannot determine UTF-8 equivalents due to a missing unicode
> mapping) + virtual GBool needUnicodeText() { return gTrue; }
> +
> // Does this device require incCharCount to be called for text on
> // non-shown layers?
> virtual GBool needCharCount() { return gTrue; }
More information about the poppler
mailing list