[poppler] [PATCH] Drop chars in TextOutputDev with no unicode CMap

Albert Astals Cid aacid at kde.org
Thu May 28 01:24:48 PDT 2015


Please use bugzilla for patches, it's much easier to track them there than in 
the mailing list.

Cheers,
  Albert

El Dimecres, 27 de maig de 2015, a les 22:12:39, Peter Waller va escriure:
>  poppler/Gfx.cc          | 6 ++++++
>  poppler/OutputDev.h     | 4 ++++
>  poppler/TextOutputDev.h | 4 ++++
>  3 files changed, 14 insertions(+)
> 
> New commits:
> Author: Peter Waller <p at pwaller.net>
> Date: Wed, 27 May 2015 22:02:28 +0100
> 
> If the font has no unicode cmap, it's not possible to output text for
> that encoding, so rather than potentially corrupting the textual
> output, the characters are dropped. These characters are kept
> for rendering.
> 
> It may be possible to keep the characters for text output if they
> happen to lie in the set of printable characters, but my first priority
> is to fix crashes where the glib API returns an inconsistent number
> of glyphs via poppler_page_get_text and
> poppler_page_get_text_layout.
> 
> Original bug: https://bugs.freedesktop.org/show_bug.cgi?id=73885
> 
> diff --git a/poppler/Gfx.cc b/poppler/Gfx.cc
> index 07d95b3..130363d 100644
> --- a/poppler/Gfx.cc
> +++ b/poppler/Gfx.cc
> @@ -3934,6 +3934,12 @@ void Gfx::doShowText(GooString *s) {
>    int len, n, uLen, nChars, nSpaces, i;
> 
>    font = state->getFont();
> +
> +  if (out->needUnicodeText() && !font->hasToUnicodeCMap()) {
> +    // No conversion to unicode available, drop characters.
> +    return;
> +  }
> +
>    wMode = font->getWMode();
> 
>    if (out->useDrawChar()) {
> diff --git a/poppler/OutputDev.h b/poppler/OutputDev.h
> index e8a7a47..7e63739 100644
> --- a/poppler/OutputDev.h
> +++ b/poppler/OutputDev.h
> @@ -116,6 +116,10 @@ public:
>    // Does this device need non-text content?
>    virtual GBool needNonText() { return gTrue; }
> 
> +  // Does this device expect valid UTF-8 text? (i.e, discard characters for
> +  // which cannot determine UTF-8 equivalents due to a missing unicode
> mapping) +  virtual GBool needUnicodeText() { return gFalse; }
> +
>    // Does this device require incCharCount to be called for text on
>    // non-shown layers?
>    virtual GBool needCharCount() { return gFalse; }
> diff --git a/poppler/TextOutputDev.h b/poppler/TextOutputDev.h
> index a0aa6f8..8bbd018 100644
> --- a/poppler/TextOutputDev.h
> +++ b/poppler/TextOutputDev.h
> @@ -762,6 +762,10 @@ public:
>    // Does this device need non-text content?
>    virtual GBool needNonText() { return gFalse; }
> 
> +  // Does this device expect valid UTF-8 text? (i.e, discard characters for
> +  // which cannot determine UTF-8 equivalents due to a missing unicode
> mapping) +  virtual GBool needUnicodeText() { return gTrue; }
> +
>    // Does this device require incCharCount to be called for text on
>    // non-shown layers?
>    virtual GBool needCharCount() { return gTrue; }



More information about the poppler mailing list