[poppler] pdftotext needs support for surrogates outside the BMP plane

Koji Otani sho at bbr.jp
Sun Jun 1 20:50:10 PDT 2008



From: Albert Astals Cid <aacid at kde.org>
Subject: Re: [poppler] pdftotext needs support for surrogates outside the BMP plane
Date: Sun, 1 Jun 2008 17:28:11 +0200
Message-ID: <200806011728.11948.aacid at kde.org>

aacid> A Dijous 29 Maig 2008, Koji Otani va escriure:
aacid> > Hi, All.
aacid> >
aacid> > I'd like to commit this patch to the trunk tree.
aacid> > Should I register this to Bugzilla before doing it?
aacid> 
aacid> No, but i'd like to confirm that "it works" before commiting it, i can see 
aacid> that your patch gives a different output but i don't have any font installed 
aacid> in my system that can "draw" the characters, what font are you using?
aacid> 
aacid> Albert
aacid>

Output is a UTF-8 text file. I don't have fonts that can draw this text
file too. I checked if it is correct with a hexdump application.

This problem was reported by Dr. Ross Moore. He viewed it with Mac
text editor. but I can't view it with my Mac text-editor.

> Dr. Ross Moore
 What font are you using?

---------------
Koji Otani



aacid> > --------------
aacid> > Koji Otani.
aacid> >
aacid> > From: Ross Moore <ross at ics.mq.edu.au>
aacid> > Subject: Re: [poppler] pdftotext needs support for surrogates outside the
aacid> > BMP plane Date: Thu, 29 May 2008 09:06:24 +1000
aacid> > Message-ID: <29E1BEE5-11BA-4A5F-A881-29DFA63A7E8A at maths.mq.edu.au>
aacid> >
aacid> > ross>
aacid> > ross> On 28/05/2008, at 6:25 PM, Koji Otani wrote:
aacid> > ross> > Hi.
aacid> > ross> >
aacid> > ross> > ross> There are many pieces of software that do not regard the
aacid> > 6-byte ross> > ross> sequences
aacid> > ross> > ross> as being valid UTF-8. Thus there needs to be an extra step
aacid> > that ross> > ross> translates
aacid> > ross> > ross> these 2 x 3 = 6-byte sequences into the proper UTF-8 4-byte
aacid> > ross> > sequence.
aacid> > ross> > ross>
aacid> > ross> > ross> Is anybody working on this kind of thing?
aacid> > ross> > ross>
aacid> > ross> >
aacid> > ross> > I've made a patch fixes this bug, and attached it to this mail.
aacid> > ross>
aacid> > ross> Thank you very much for this.
aacid> > ross> It works brilliantly.
aacid> > ross>
aacid> > ross> The attached image shows the result of using
aacid> > ross>
aacid> > ross>       pdftotext -layout testmath.pdf
aacid> > ross>
aacid> > ross> on the example PDF from my previous message,
aacid> > ross> viewed with a standard Mac text-editor application.
aacid> > ross>
aacid> > _______________________________________________
aacid> > poppler mailing list
aacid> > poppler at lists.freedesktop.org
aacid> > http://lists.freedesktop.org/mailman/listinfo/poppler
aacid> 
aacid> 
aacid> _______________________________________________
aacid> poppler mailing list
aacid> poppler at lists.freedesktop.org
aacid> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list