[poppler] [PATCH] Fixup LaTeX composed characters
Albert Astals Cid
aacid at kde.org
Mon May 9 11:52:38 PDT 2011
A Monday, May 09, 2011, Tim Brody va escriure:
> On Sat, 2011-05-07 at 19:02 +0100, Jonathan Kew wrote:
> > On 7 May 2011, at 17:43, Albert Astals Cid wrote:
> > > A Friday, April 01, 2011, Albert Astals Cid va escriure:
> > >> A Divendres, 1 d'abril de 2011, Tim Brody va escriure:
> > >>> On Thu, 31 Mar 2011 23:28:02 +0100, Albert Astals Cid <aacid at kde.org>
> > >>>
> > >>> wrote:
> > >>>> A Dimecres, 30 de març de 2011, vàreu escriure:
> > >>>>> On Tue, 2011-03-29 at 22:45 +0100, Albert Astals Cid wrote:
> > >>>>>>>> I still get
> > >>>>>>>>
> > >>>>>>>> -R. L¨wen and B. Polster
> > >>>>>>>> -o
> > >>>>>>>> +R. Lowen and B. Polster
> > >>>>>>>>
> > >>>>>>>> Maybe you sent a old version of the patch? Can anyone confirm if
> > >>>>
> > >>>> My bad, somehow vi/diff/less are showing me o but if i open it in
> > >>>> kate i see
> > >>>> an ö
> > >>>
> > >>> That will be because it's separate characters (X + combining char).
> > >>> You could normalise with unicodeNormalizeNFKC but I thought it
> > >>> probably better to leave text - as far as possible - unchanged from
> > >>> the PDF source.
> > >>
> > >> Hmmmmmm, since we are already changing the "real" representation of
> > >> the text (i.e transforming it from broken to not broken), i think i
> > >> prefer one that is easy to use (i.e. shows ö in most of the tools),
> > >> what do others think?
> > >
> > > Since the others are not there, please do what i want and output a real
> > > ö
> >
> > If you're going to apply a Unicode normalization process, please use
> >
> > NFC rather than NFKC. This will deal with creating precomposed
> > letter+accent combinations, but avoids introducing "compatibility"
> > changes that may lose significant distinctions in the text.
>
> For reference:
> NFC = pre-composed
> NFKC = pre-composed plus simplified ligatures ('fi' => 'f'+'i')
>
> I agree but there isn't an NFC in poppler. It seems a waste of time to
> be writing one from scratch in Poppler or is there really no Unicode
> library that provides normalisations?
Couldn't you have said that (we have no code to compose stuff) when I asked
the list if we wanted composed or not?
Sincerely i am quite hesitant to apply your patch since it "breaks" pdftotext
usage in the console (since it seems most of the apps in the console are not
able to understand the non-composed form)
Albert
>
> All the best,
> Tim.
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list