[poppler] pdftotext needs support for surrogates outside the BMP plane
Albert Astals Cid
aacid at kde.org
Mon Jun 2 11:35:18 PDT 2008
A Dilluns 02 Juny 2008, Koji Otani va escriure:
> Thank you.
>
> I could view the text file with Unicode Symbol font.
>
> > Albert
>
> Could you conform the patch with these information ?
Works here with the Unicode Symbols font too.
One last thing:
+ if (u[i] >= 0xd800 && u[i] < 0xdc00) { /* surrogate pair */
+ if (i + 1 < uLen) {
+ Unicode uu = (u[i] & 0x3ff) << 10 | (u[i+1] & 0x3ff) | 0x10000;
+ i++;
+ curWord->addChar(state, x1 + i*w1, y1 + i*h1, w1, h1, c, uu);
+ }
+ } else {
+ curWord->addChar(state, x1 + i*w1, y1 + i*h1, w1, h1, c, u[i]);
+ }
That happens if "if (i + 1 < uLen) {" is false? Do we lose a char? Or that
should never happen and is an error? If it's an error i think we should have
an else branch with something like
} else {
error(-1, "Got surrogate pair start char but did not have second char")
}
Albert
> -----------
> Koji Otani.
>
>
> From: Ross Moore <ross at ics.mq.edu.au>
> Subject: Re: [poppler] pdftotext needs support for surrogates outside the
> BMP plane Date: Mon, 2 Jun 2008 15:53:54 +1000
> Message-ID: <308EB069-DD16-407F-B467-5B5F524F9887 at maths.mq.edu.au>
>
> ross> Hi Koji,
> ross>
> ross> On 02/06/2008, at 1:50 PM, Koji Otani wrote:
> ross> >
> ross> >
> ross> > From: Albert Astals Cid <aacid at kde.org>
> ross> > Subject: Re: [poppler] pdftotext needs support for surrogates
> ross> > outside the BMP plane
> ross> > Date: Sun, 1 Jun 2008 17:28:11 +0200
> ross> > Message-ID: <200806011728.11948.aacid at kde.org>
> ross> >
> ross> > aacid> A Dijous 29 Maig 2008, Koji Otani va escriure:
> ross> > aacid> > Hi, All.
> ross> > aacid> >
> ross> > aacid> > I'd like to commit this patch to the trunk tree.
> ross> > aacid> > Should I register this to Bugzilla before doing it?
> ross> > aacid>
> ross> > aacid> No, but i'd like to confirm that "it works" before commiting
> ross> > it, i can see
> ross> > aacid> that your patch gives a different output but i don't have
> ross> > any font installed
> ross> > aacid> in my system that can "draw" the characters, what font are
> ross> > you using?
> ross> > aacid>
> ross> > aacid> Albert
> ross> > aacid>
> ross> >
> ross> > Output is a UTF-8 text file. I don't have fonts that can draw this
> ross> > text
> ross> > file too. I checked if it is correct with a hexdump application.
> ross> >
> ross> > This problem was reported by Dr. Ross Moore. He viewed it with Mac
> ross> > text editor. but I can't view it with my Mac text-editor.
> ross> >
> ross> >> Dr. Ross Moore
> ross> > What font are you using?
> ross>
> ross> I have several which can show these glyphs.
> ross>
> ross> In TextEdit, the default font that is being used is "Unicode
> Symbols", ross> as shown in one of the attached screenshots.
> ross> Get it from http://users.teilar.gr/~g1951d/ .
> ross>
> ross> The other screenshot shows which fonts I have installed
> ross> that support Plane 1 characters.
> ross>
> ross>
> ross> Other possibilities are Code200/Code2001/Code2002
> ross> e.g., from http://www.code2000.net/code2001.htm .
> ross>
> ross> The STIX fonts are scheduled for release soon:
> ross> http://www.stixfonts.org/rel_sched.html
> ross> (The beta testing release is no longer available.)
> ross>
> ross> Other free fonts are also available; e.g. Asana Math
> ross> http://openfontlibrary.org/media/files/asyropoulos/219 .
> ross>
> ross> Or if you are prepared to try Microsoft's "Cambria Math",
> ross> then that should work.
> ross>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list