[poppler] [PATCH] Fixup LaTeX composed characters

Albert Astals Cid aacid at kde.org
Thu May 12 15:04:13 PDT 2011


A Thursday, May 12, 2011, Tim Brody va escriure:
> On Wed, 2011-05-11 at 19:56 +0100, Albert Astals Cid wrote:
> > A Wednesday, May 11, 2011, Tim Brody va escriure:
> > > On Tue, 10 May 2011 19:15:51 +0100, Albert Astals Cid <aacid at kde.org>
> > > 
> > > wrote:
> > > > A Tuesday, May 10, 2011, Tim Brody va escriure:
> > > >> > Sincerely i am quite hesitant to apply your patch since it
> > > >> > "breaks" pdftotext
> > > >> > usage in the console (since it seems most of the apps in the
> > > >> > console are
> > > >> > not
> > > >> > able to understand the non-composed form)
> > > >> 
> > > >> Anyway, my patch is only a fix-up of overprinting characters that
> > > >> would otherwise get mangled by pfdtotext. It just makes it more
> > > >> apparent that your tool-chain is broken because it's producing more
> > > >> non-ASCII7 code-points.
> > > > 
> > > > By tool-chain you mean pdftotext?
> > > 
> > > I mean whatever you're piping to. I haven't encountered a problem with
> > > decomposed Unicode in bash/less/vim.
> > 
> > Really? My vim doesn't seem to like that files (i.e. it shows o instead
> > of ö). Anyone has any idea of what might be causing that?
> 
> Ubuntu 10.04
> VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Apr 16 2010 13:27:36)
> Included patches: 1-330
> GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu)
> less 436

Just found out it is my terminal that doesn't render correctly those 
characters.
 
> I can copy-n-paste decomposed chars from shell into vim (editing a Perl
> script). 2 characters (=3 bytes) get pasted although only one is
> highlighted (i.e. it's behaving as you'd expect).
> 
> But anyway ... are you going to apply the LaTeX-fix?

Now that i found the culprit i can run the regtesting and if it does not give 
regressions i'll commit it.

Albert

> If you want to add normalisation to pdftotext output I would use icu but
> I'd rather have that discussion separately.
> /Tim.


More information about the poppler mailing list