[poppler] Solving 8 chars maximum limit on a glyph representation

Albert Astals Cid aacid at kde.org
Sun Jun 1 07:28:28 PDT 2008


A Dilluns 12 Maig 2008, Ross Moore va escriure:
> Hi Albert,
>
> On 04/05/2008, at 9:07 AM, Albert Astals Cid wrote:
> > A Diumenge 04 Maig 2008, Albert Astals Cid va escriure:
> >> Like Ross pdf showed, we have a maximum limit of 8 char for the
> >> representation of a glyph, so even there's a char that identifies
> >> itself as
> >> \rightarrow pdftotext only gives \rightar
> >>
> >> I'm fixing this hardcoded limit with the attached patch. As side
> >> effects
> >> we're having a speed boost as i stop copying things when calling
> >> CharCodeToUnicode::mapToUnicode and lower memory usage as for each
> >> CharCodeToUnicodeString now only the exact memory needed is used,
> >> not a
> >> fixed 8 like before.
> >>
> >> I'm attaching the patch for further review. If noone disagrees
> >> i'll commit
> >> on sunday 11.
>
> No disagreement from me.
> I've applied the patch, and the earlier ones related to Annotations,
> etc.
>
> All the  utils/pdfto*  work much better (no Bus Error) with my
> example PDFs,
> except for  pdfimages (which generates image files of size 8 bytes !)

That wasn't working before either, please open a bug about it on 
bugs.freedesktop.org so we can work on it for the future.

>
>
> Thanks very much for your work on this.
>
>
> However, there are still some problems with the actual text strings
> extracted using  pdftotext .

You mean the problems with "Introduction A flat stable plane (��, ℒ)" etc? 
Adobe Reader can neither get text from there so i'm not sure it's completely 
our fault.

Albert


More information about the poppler mailing list