[poppler] [PATCH] Fixup LaTeX composed characters

Jonathan Kew jfkthame at googlemail.com
Sat May 7 11:02:30 PDT 2011

On 7 May 2011, at 17:43, Albert Astals Cid wrote:

> A Friday, April 01, 2011, Albert Astals Cid va escriure:
>> A Divendres, 1 d'abril de 2011, Tim Brody va escriure:
>>> On Thu, 31 Mar 2011 23:28:02 +0100, Albert Astals Cid <aacid at kde.org>
>>> wrote:
>>>> A Dimecres, 30 de març de 2011, vàreu escriure:
>>>>> On Tue, 2011-03-29 at 22:45 +0100, Albert Astals Cid wrote:
>>>>>>>> I still get
>>>>>>>> -R. L¨wen and B. Polster
>>>>>>>> -o
>>>>>>>> +R. Lowen and B. Polster
>>>>>>>> Maybe you sent a old version of the patch? Can anyone confirm if
>>>> My bad, somehow vi/diff/less are showing me o but if i open it in kate
>>>> i see
>>>> an ö
>>> That will be because it's separate characters (X + combining char). You
>>> could normalise with unicodeNormalizeNFKC but I thought it probably
>>> better to leave text - as far as possible - unchanged from the PDF
>>> source.
>> Hmmmmmm, since we are already changing the "real" representation of the
>> text (i.e transforming it from broken to not broken), i think i prefer one
>> that is easy to use (i.e. shows ö in most of the tools), what do others
>> think?
> Since the others are not there, please do what i want and output a real ö

If you're going to apply a Unicode normalization process, please use NFC rather than NFKC. This will deal with creating precomposed letter+accent combinations, but avoids introducing "compatibility" changes that may lose significant distinctions in the text.


More information about the poppler mailing list