[poppler] commit? bounding box html in pdftotext

Kenneth Berland ken at hero.com
Sun May 30 09:31:39 PDT 2010


1)  Since I sent my last diff, I've:

 	a) added some string processing to make sure no HTML reserved 
characters are placed into the output.  I process each word.
 	b) altered the html a bit so that XML parsers can deal with it. 
I've put in a title tag or an empty title tag and added end tags to the 
meta tags.

2)  Addressing your concerns:

 	a) I've removed the initialization of stdout.

 	b) I close f now and reopen it.  This also removes the warning.

 	c) If a user is running with the -bbox option, they want word 
bounding boxes.  If there are no words, I think a line to stderr is 
appropriate.

-KB


On Wed, 26 May 2010, Albert Astals Cid wrote:

> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
>> I get a compiler warning without it.
>> 
>> pdftotext.cc: In function ‘int main(int, char**)’:
>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in this function
>
> That change will not get accepted, sorry, initializing f to stdout is not a 
> solution.
>
> Also i do not like the fact that you do not close f if you are writing the 
> bbox? Can't you just open it again like the code already does?
>
> Also i do not understand why the code considers a page having no text an 
> error.
>
> Albert
>
>> 
>> 
>> -KB
>> 
>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>> > A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
>> >> List,
>> >> 
>> >> I've attached a small addition to pdftotext that outputs bounding box
>> >> information to html like this:
>> >> 
>> >> <doc>
>> >> 
>> >>    <page width="612.000000" height="792.000000"/>
>> >> 
>> >>      <word xMin="56.800000" yMin="57.208000" xMax="75.412000"
>> >> 
>> >> yMax="70.492000">The</word> </page>
>> >> </doc>
>> >> 
>> >> I had a need, maybe others will too.
>> >> 
>> >> -KB
>> > 
>> > Why is this change necessary?
>> > 
>> > -  FILE *f;
>> > +  FILE *f = stdout;
>> > 
>> > Albert
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: diff.txt
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100530/89825275/attachment.txt>


More information about the poppler mailing list