[poppler] [PATCH] Fixup LaTeX composed characters

Albert Astals Cid aacid at kde.org
Sat May 14 05:01:43 PDT 2011


A Thursday, May 12, 2011, Albert Astals Cid va escriure:
> A Thursday, May 12, 2011, Tim Brody va escriure:
> > On Wed, 2011-05-11 at 19:56 +0100, Albert Astals Cid wrote:
> > > A Wednesday, May 11, 2011, Tim Brody va escriure:
> > > > On Tue, 10 May 2011 19:15:51 +0100, Albert Astals Cid <aacid at kde.org>
> > > > 
> > > > wrote:
> > > > > A Tuesday, May 10, 2011, Tim Brody va escriure:
> > > > >> > Sincerely i am quite hesitant to apply your patch since it
> > > > >> > "breaks" pdftotext
> > > > >> > usage in the console (since it seems most of the apps in the
> > > > >> > console are
> > > > >> > not
> > > > >> > able to understand the non-composed form)
> > > > >> 
> > > > >> Anyway, my patch is only a fix-up of overprinting characters that
> > > > >> would otherwise get mangled by pfdtotext. It just makes it more
> > > > >> apparent that your tool-chain is broken because it's producing
> > > > >> more non-ASCII7 code-points.
> > > > > 
> > > > > By tool-chain you mean pdftotext?
> > > > 
> > > > I mean whatever you're piping to. I haven't encountered a problem
> > > > with decomposed Unicode in bash/less/vim.
> > > 
> > > Really? My vim doesn't seem to like that files (i.e. it shows o instead
> > > of ö). Anyone has any idea of what might be causing that?
> > 
> > Ubuntu 10.04
> > VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Apr 16 2010 13:27:36)
> > Included patches: 1-330
> > GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu)
> > less 436
> 
> Just found out it is my terminal that doesn't render correctly those
> characters.
> 
> > I can copy-n-paste decomposed chars from shell into vim (editing a Perl
> > script). 2 characters (=3 bytes) get pasted although only one is
> > highlighted (i.e. it's behaving as you'd expect).
> > 
> > But anyway ... are you going to apply the LaTeX-fix?
> 
> Now that i found the culprit i can run the regtesting and if it does not
> give regressions i'll commit it.

Can you give a look at the attached file? pdftotext in one sentence changes

En konstant vandtemperatur ved brygningen på mellem 92° – 96° er optimal,
92
96
idet de velsmagende komponenter frigøres ved denne temperatur.

to

En konstant vandtemperatur ved brygningen på mellem 92v̊arme
92 – 96° er optimal,
96
idet de velsmagende komponenter frigøres ved denne temperatur.

It seems like you are composing the degree symbol with the v (actually no idea 
where that extra varme comes from either)?

Thanks,
  Albert

> 
> Albert
> 
> > If you want to add normalisation to pdftotext output I would use icu but
> > I'd rather have that discussion separately.
> > /Tim.
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug143479.pdf
Type: application/pdf
Size: 902389 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110514/ef1d236a/attachment-0001.pdf>


More information about the poppler mailing list