[poppler] [PATCH] Fixup LaTeX composed characters

Albert Astals Cid aacid at kde.org
Fri Mar 25 13:43:59 PDT 2011


A Divendres, 25 de març de 2011, vàreu escriure:
> On Fri, 25 Mar 2011 19:02:46 +0000, Albert Astals Cid <aacid at kde.org>
> 
> wrote:
> > A Divendres, 25 de març de 2011, Tim Brody va escriure:
> >> Hi All,
> >> 
> >> Attached is a patch to address the previous problem I wrote about with
> >> pdflatex-produced PDFs that contain overlapping-diacritics/accents.
> >> 
> >> This patch contains:
> >>  - a table of diacritic to Unicode combining character code-points
> >>  - if an overlapping character is detected checks whether the first (in
> >> 
> >> stream-sequence) character is in the table
> >> 
> >>   - pops the diacritic off the word
> >>   - appends the diacritic to the character as a Unicode combining
> >>   character
> >> 
> >> This does not fix \b{o} or \d{o} because TeX places them on the next
> 
> line
> 
> >> (so aren't detected as overlapping).
> >> 
> >> Yes, this is an issue with pdflatex but there are 100,000s of
> >> TeX-produced
> >> PDFs for which we don't have source for ...
> > 
> > Hmmm, is it supposed to just kill the diacritic mark?
> > 
> > R. L¨wen and B. Polster
> > o
> > gets converted to
> > R. Lowen and B. Polster
> > shouldn't it be
> > R. Löwen and B. Polster
> > ?
> 
> It should do - can you send me this PDF?

http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf

> 
> I get this from TeX:
> R. L\"owen and B. Polster => R. Löwen and B. Polster
> 
> NB I just tried extracting from a Word-generated PDF and TextOutputDev
> didn't see the line with the diacritic at all.

And are you sure it's not a Word fault?

Albert


More information about the poppler mailing list