[poppler] [PATCH] Fixup LaTeX composed characters

Albert Astals Cid aacid at kde.org
Sat Jun 4 09:51:34 PDT 2011


A Tuesday, May 17, 2011, Tim Brody va escriure:
> On Mon, 2011-05-16 at 20:22 +0100, Albert Astals Cid wrote:
> > A Monday, May 16, 2011, Tim Brody va escriure:
> > > On Sat, 2011-05-14 at 13:01 +0100, Albert Astals Cid wrote:
> > > > Can you give a look at [bug143479.pdf]? pdftotext in one sentence
> > > > changes
> > > > 
> > > > En konstant vandtemperatur ved brygningen på mellem 92° – 96° er
> > > > optimal, 92
> > > > 96
> > > > idet de velsmagende komponenter frigøres ved denne temperatur.
> > > > 
> > > > to
> > > > 
> > > > En konstant vandtemperatur ved brygningen på mellem 92v̊arme
> > > > 92 – 96° er optimal,
> > > > 96
> > > > idet de velsmagende komponenter frigøres ved denne temperatur.
> > > > 
> > > > It seems like you are composing the degree symbol with the v
> > > > (actually no idea where that extra varme comes from either)?
> > > 
> > > There's something wrong in the calling code. The stream of chars coming
> > > into addChar() is this (varme is from the line below):
> > > 9-2-°-v-a-r-m-e
> > > 
> > > (Note '92 96' also get repeated)
> > > 
> > > I didn't add vertical overlap-checking because AFAIK that should happen
> > > in ActualText.
> > 
> > Not sure i understand you, are you saying we should accept your patch
> > with this regression?
> 
> This appears to fix the regression:
>         // whitespace along main axis
>         sp > minWordBreakSpace * curWord->fontSize ||
> +        // overlaps more than a character
> +        fabs(sp) > (minWordBreakSpace+1) * curWord->fontSize ||
> 
> But there's something odd going on - this line gets drawn three times
> (note: it line-breaks on the em-dash):
> varme og luft. En konstant vandtemperatur ved brygningen på mellem 92°
> 
> Can you look at the decompressed stream to see what it contains?
> 
> I expect that is what is causing the repeated '92 96'.

That file is good enough for me with your improved patch. I've found another 
regression, in http://www.kde.cat/aacid/bug165809.pdf

The old output was
    There are inequalities on the A's
and the new one is
    There are inequalities on the Aś

The first one is correct since looking at the document you can see it is not 
intended to have the ' over the s

Could you please have a look to see if you can fix it?

Thanks,
  Albert

P.S: Please mail the list and not me directly


> /Tim.


More information about the poppler mailing list