[poppler] commit? bounding box html in pdftotext

Kenneth Berland ken at hero.com
Mon Jul 5 20:36:56 PDT 2010


Can I use std::string within any GooString methods I write (e.g. replace) 
or am I limited to the C Standard library (i.e. string.h)?

-KB


On Tue, 8 Jun 2010, Albert Astals Cid wrote:

> A Dimarts, 8 de juny de 2010, vàreu escriure:
>> Does GooString have a replace() method?  I could not find one.  Does this
>> mean I should write one?
>
> Yes, you'll have to write one or get the char * from the GooString and use c-
> string ones.
>
> Albert
>
>> 
>> -KB
>> 
>> On Sun, 30 May 2010, Albert Astals Cid wrote:
>> > A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
>> >> 1)  Since I sent my last diff, I've:
>> >>  	a) added some string processing to make sure no HTML reserved
>> >> 
>> >> characters are placed into the output.  I process each word.
>> >> 
>> >>  	b) altered the html a bit so that XML parsers can deal with it.
>> >> 
>> >> I've put in a title tag or an empty title tag and added end tags to the
>> >> meta tags.
>> >> 
>> >> 2)  Addressing your concerns:
>> >>  	a) I've removed the initialization of stdout.
>> >> 
>> >>  	b) I close f now and reopen it.  This also removes the warning.
>> >> 
>> >>  	c) If a user is running with the -bbox option, they want word
>> >> 
>> >> bounding boxes.  If there are no words, I think a line to stderr is
>> >> appropriate.
>> > 
>> > Cool, though we try not to use the std (yeah it sucks i know), can you
>> > either use GooString or char *?
>> > 
>> > 
>> > Thanks,
>> > 
>> >  Albert
>> > 
>> >> -KB
>> >> 
>> >> On Wed, 26 May 2010, Albert Astals Cid wrote:
>> >>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
>> >>>> I get a compiler warning without it.
>> >>>> 
>> >>>> pdftotext.cc: In function ‘int main(int, char**)’:
>> >>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in this
>> >>>> function
>> >>> 
>> >>> That change will not get accepted, sorry, initializing f to stdout is
>> >>> not a solution.
>> >>> 
>> >>> Also i do not like the fact that you do not close f if you are writing
>> >>> the bbox? Can't you just open it again like the code already does?
>> >>> 
>> >>> Also i do not understand why the code considers a page having no text
>> >>> an error.
>> >>> 
>> >>> Albert
>> >>> 
>> >>>> -KB
>> >>>> 
>> >>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>> >>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
>> >>>>>> List,
>> >>>>>> 
>> >>>>>> I've attached a small addition to pdftotext that outputs bounding
>> >>>>>> box information to html like this:
>> >>>>>> 
>> >>>>>> <doc>
>> >>>>>> 
>> >>>>>>    <page width="612.000000" height="792.000000"/>
>> >>>>>> 
>> >>>>>>      <word xMin="56.800000" yMin="57.208000" xMax="75.412000"
>> >>>>>> 
>> >>>>>> yMax="70.492000">The</word> </page>
>> >>>>>> </doc>
>> >>>>>> 
>> >>>>>> I had a need, maybe others will too.
>> >>>>>> 
>> >>>>>> -KB
>> >>>>> 
>> >>>>> Why is this change necessary?
>> >>>>> 
>> >>>>> -  FILE *f;
>> >>>>> +  FILE *f = stdout;
>> >>>>> 
>> >>>>> Albert
>> >>> 
>> >>> _______________________________________________
>> >>> poppler mailing list
>> >>> poppler at lists.freedesktop.org
>> >>> http://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list