[poppler] [PATCH] Drop chars in TextOutputDev with no unicode CMap
Peter Waller
peter at scraperwiki.com
Thu May 28 01:26:12 PDT 2015
I've attached it to https://bugs.freedesktop.org/show_bug.cgi?id=73885
- please let me know if that was not the right thing to do.
On 28 May 2015 at 09:24, Albert Astals Cid <aacid at kde.org> wrote:
> Please use bugzilla for patches, it's much easier to track them there than in
> the mailing list.
>
> Cheers,
> Albert
>
> El Dimecres, 27 de maig de 2015, a les 22:12:39, Peter Waller va escriure:
>> poppler/Gfx.cc | 6 ++++++
>> poppler/OutputDev.h | 4 ++++
>> poppler/TextOutputDev.h | 4 ++++
>> 3 files changed, 14 insertions(+)
>>
>> New commits:
>> Author: Peter Waller <p at pwaller.net>
>> Date: Wed, 27 May 2015 22:02:28 +0100
>>
>> If the font has no unicode cmap, it's not possible to output text for
>> that encoding, so rather than potentially corrupting the textual
>> output, the characters are dropped. These characters are kept
>> for rendering.
>>
>> It may be possible to keep the characters for text output if they
>> happen to lie in the set of printable characters, but my first priority
>> is to fix crashes where the glib API returns an inconsistent number
>> of glyphs via poppler_page_get_text and
>> poppler_page_get_text_layout.
>>
>> Original bug: https://bugs.freedesktop.org/show_bug.cgi?id=73885
>>
>> diff --git a/poppler/Gfx.cc b/poppler/Gfx.cc
>> index 07d95b3..130363d 100644
>> --- a/poppler/Gfx.cc
>> +++ b/poppler/Gfx.cc
>> @@ -3934,6 +3934,12 @@ void Gfx::doShowText(GooString *s) {
>> int len, n, uLen, nChars, nSpaces, i;
>>
>> font = state->getFont();
>> +
>> + if (out->needUnicodeText() && !font->hasToUnicodeCMap()) {
>> + // No conversion to unicode available, drop characters.
>> + return;
>> + }
>> +
>> wMode = font->getWMode();
>>
>> if (out->useDrawChar()) {
>> diff --git a/poppler/OutputDev.h b/poppler/OutputDev.h
>> index e8a7a47..7e63739 100644
>> --- a/poppler/OutputDev.h
>> +++ b/poppler/OutputDev.h
>> @@ -116,6 +116,10 @@ public:
>> // Does this device need non-text content?
>> virtual GBool needNonText() { return gTrue; }
>>
>> + // Does this device expect valid UTF-8 text? (i.e, discard characters for
>> + // which cannot determine UTF-8 equivalents due to a missing unicode
>> mapping) + virtual GBool needUnicodeText() { return gFalse; }
>> +
>> // Does this device require incCharCount to be called for text on
>> // non-shown layers?
>> virtual GBool needCharCount() { return gFalse; }
>> diff --git a/poppler/TextOutputDev.h b/poppler/TextOutputDev.h
>> index a0aa6f8..8bbd018 100644
>> --- a/poppler/TextOutputDev.h
>> +++ b/poppler/TextOutputDev.h
>> @@ -762,6 +762,10 @@ public:
>> // Does this device need non-text content?
>> virtual GBool needNonText() { return gFalse; }
>>
>> + // Does this device expect valid UTF-8 text? (i.e, discard characters for
>> + // which cannot determine UTF-8 equivalents due to a missing unicode
>> mapping) + virtual GBool needUnicodeText() { return gTrue; }
>> +
>> // Does this device require incCharCount to be called for text on
>> // non-shown layers?
>> virtual GBool needCharCount() { return gTrue; }
>
More information about the poppler
mailing list