[poppler] [PATCH] Fixup LaTeX composed characters
Albert Astals Cid
aacid at kde.org
Fri Mar 25 13:43:59 PDT 2011
A Divendres, 25 de març de 2011, vàreu escriure:
> On Fri, 25 Mar 2011 19:02:46 +0000, Albert Astals Cid <aacid at kde.org>
>
> wrote:
> > A Divendres, 25 de març de 2011, Tim Brody va escriure:
> >> Hi All,
> >>
> >> Attached is a patch to address the previous problem I wrote about with
> >> pdflatex-produced PDFs that contain overlapping-diacritics/accents.
> >>
> >> This patch contains:
> >> - a table of diacritic to Unicode combining character code-points
> >> - if an overlapping character is detected checks whether the first (in
> >>
> >> stream-sequence) character is in the table
> >>
> >> - pops the diacritic off the word
> >> - appends the diacritic to the character as a Unicode combining
> >> character
> >>
> >> This does not fix \b{o} or \d{o} because TeX places them on the next
>
> line
>
> >> (so aren't detected as overlapping).
> >>
> >> Yes, this is an issue with pdflatex but there are 100,000s of
> >> TeX-produced
> >> PDFs for which we don't have source for ...
> >
> > Hmmm, is it supposed to just kill the diacritic mark?
> >
> > R. L¨wen and B. Polster
> > o
> > gets converted to
> > R. Lowen and B. Polster
> > shouldn't it be
> > R. Löwen and B. Polster
> > ?
>
> It should do - can you send me this PDF?
http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf
>
> I get this from TeX:
> R. L\"owen and B. Polster => R. Löwen and B. Polster
>
> NB I just tried extracting from a Word-generated PDF and TextOutputDev
> didn't see the line with the diacritic at all.
And are you sure it's not a Word fault?
Albert
More information about the poppler
mailing list