[poppler] pdftotext needs support for surrogates outside the BMP plane

Koji Otani sho at bbr.jp
Wed May 28 20:31:03 PDT 2008


Hi, All.

I'd like to commit this patch to the trunk tree.
Should I register this to Bugzilla before doing it?
--------------
Koji Otani.

From: Ross Moore <ross at ics.mq.edu.au>
Subject: Re: [poppler] pdftotext needs support for surrogates outside the BMP plane
Date: Thu, 29 May 2008 09:06:24 +1000
Message-ID: <29E1BEE5-11BA-4A5F-A881-29DFA63A7E8A at maths.mq.edu.au>

ross> 
ross> On 28/05/2008, at 6:25 PM, Koji Otani wrote:
ross> > Hi.
ross> >
ross> > ross> There are many pieces of software that do not regard the 6-byte
ross> > ross> sequences
ross> > ross> as being valid UTF-8. Thus there needs to be an extra step that
ross> > ross> translates
ross> > ross> these 2 x 3 = 6-byte sequences into the proper UTF-8 4-byte  
ross> > sequence.
ross> > ross>
ross> > ross> Is anybody working on this kind of thing?
ross> > ross>
ross> >
ross> > I've made a patch fixes this bug, and attached it to this mail.
ross> 
ross> Thank you very much for this.
ross> It works brilliantly.
ross> 
ross> The attached image shows the result of using
ross> 
ross>       pdftotext -layout testmath.pdf
ross> 
ross> on the example PDF from my previous message,
ross> viewed with a standard Mac text-editor application.
ross> 


More information about the poppler mailing list