[poppler] [PATCH] Drop chars in TextOutputDev with no unicode CMap

Peter Waller peter at scraperwiki.com
Thu May 28 01:26:12 PDT 2015


I've attached it to https://bugs.freedesktop.org/show_bug.cgi?id=73885
- please let me know if that was not the right thing to do.

On 28 May 2015 at 09:24, Albert Astals Cid <aacid at kde.org> wrote:
> Please use bugzilla for patches, it's much easier to track them there than in
> the mailing list.
>
> Cheers,
>   Albert
>
> El Dimecres, 27 de maig de 2015, a les 22:12:39, Peter Waller va escriure:
>>  poppler/Gfx.cc          | 6 ++++++
>>  poppler/OutputDev.h     | 4 ++++
>>  poppler/TextOutputDev.h | 4 ++++
>>  3 files changed, 14 insertions(+)
>>
>> New commits:
>> Author: Peter Waller <p at pwaller.net>
>> Date: Wed, 27 May 2015 22:02:28 +0100
>>
>> If the font has no unicode cmap, it's not possible to output text for
>> that encoding, so rather than potentially corrupting the textual
>> output, the characters are dropped. These characters are kept
>> for rendering.
>>
>> It may be possible to keep the characters for text output if they
>> happen to lie in the set of printable characters, but my first priority
>> is to fix crashes where the glib API returns an inconsistent number
>> of glyphs via poppler_page_get_text and
>> poppler_page_get_text_layout.
>>
>> Original bug: https://bugs.freedesktop.org/show_bug.cgi?id=73885
>>
>> diff --git a/poppler/Gfx.cc b/poppler/Gfx.cc
>> index 07d95b3..130363d 100644
>> --- a/poppler/Gfx.cc
>> +++ b/poppler/Gfx.cc
>> @@ -3934,6 +3934,12 @@ void Gfx::doShowText(GooString *s) {
>>    int len, n, uLen, nChars, nSpaces, i;
>>
>>    font = state->getFont();
>> +
>> +  if (out->needUnicodeText() && !font->hasToUnicodeCMap()) {
>> +    // No conversion to unicode available, drop characters.
>> +    return;
>> +  }
>> +
>>    wMode = font->getWMode();
>>
>>    if (out->useDrawChar()) {
>> diff --git a/poppler/OutputDev.h b/poppler/OutputDev.h
>> index e8a7a47..7e63739 100644
>> --- a/poppler/OutputDev.h
>> +++ b/poppler/OutputDev.h
>> @@ -116,6 +116,10 @@ public:
>>    // Does this device need non-text content?
>>    virtual GBool needNonText() { return gTrue; }
>>
>> +  // Does this device expect valid UTF-8 text? (i.e, discard characters for
>> +  // which cannot determine UTF-8 equivalents due to a missing unicode
>> mapping) +  virtual GBool needUnicodeText() { return gFalse; }
>> +
>>    // Does this device require incCharCount to be called for text on
>>    // non-shown layers?
>>    virtual GBool needCharCount() { return gFalse; }
>> diff --git a/poppler/TextOutputDev.h b/poppler/TextOutputDev.h
>> index a0aa6f8..8bbd018 100644
>> --- a/poppler/TextOutputDev.h
>> +++ b/poppler/TextOutputDev.h
>> @@ -762,6 +762,10 @@ public:
>>    // Does this device need non-text content?
>>    virtual GBool needNonText() { return gFalse; }
>>
>> +  // Does this device expect valid UTF-8 text? (i.e, discard characters for
>> +  // which cannot determine UTF-8 equivalents due to a missing unicode
>> mapping) +  virtual GBool needUnicodeText() { return gTrue; }
>> +
>>    // Does this device require incCharCount to be called for text on
>>    // non-shown layers?
>>    virtual GBool needCharCount() { return gTrue; }
>


More information about the poppler mailing list