[poppler] commit? bounding box html in pdftotext

Albert Astals Cid aacid at kde.org
Tue Jun 8 14:35:00 PDT 2010


A Dimarts, 8 de juny de 2010, vàreu escriure:
> Does GooString have a replace() method?  I could not find one.  Does this
> mean I should write one?

Yes, you'll have to write one or get the char * from the GooString and use c-
string ones.

Albert

> 
> -KB
> 
> On Sun, 30 May 2010, Albert Astals Cid wrote:
> > A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
> >> 1)  Since I sent my last diff, I've:
> >>  	a) added some string processing to make sure no HTML reserved
> >> 
> >> characters are placed into the output.  I process each word.
> >> 
> >>  	b) altered the html a bit so that XML parsers can deal with it.
> >> 
> >> I've put in a title tag or an empty title tag and added end tags to the
> >> meta tags.
> >> 
> >> 2)  Addressing your concerns:
> >>  	a) I've removed the initialization of stdout.
> >>  	
> >>  	b) I close f now and reopen it.  This also removes the warning.
> >>  	
> >>  	c) If a user is running with the -bbox option, they want word
> >> 
> >> bounding boxes.  If there are no words, I think a line to stderr is
> >> appropriate.
> > 
> > Cool, though we try not to use the std (yeah it sucks i know), can you
> > either use GooString or char *?
> > 
> > 
> > Thanks,
> > 
> >  Albert
> >  
> >> -KB
> >> 
> >> On Wed, 26 May 2010, Albert Astals Cid wrote:
> >>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
> >>>> I get a compiler warning without it.
> >>>> 
> >>>> pdftotext.cc: In function ‘int main(int, char**)’:
> >>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in this
> >>>> function
> >>> 
> >>> That change will not get accepted, sorry, initializing f to stdout is
> >>> not a solution.
> >>> 
> >>> Also i do not like the fact that you do not close f if you are writing
> >>> the bbox? Can't you just open it again like the code already does?
> >>> 
> >>> Also i do not understand why the code considers a page having no text
> >>> an error.
> >>> 
> >>> Albert
> >>> 
> >>>> -KB
> >>>> 
> >>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> >>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
> >>>>>> List,
> >>>>>> 
> >>>>>> I've attached a small addition to pdftotext that outputs bounding
> >>>>>> box information to html like this:
> >>>>>> 
> >>>>>> <doc>
> >>>>>> 
> >>>>>>    <page width="612.000000" height="792.000000"/>
> >>>>>>    
> >>>>>>      <word xMin="56.800000" yMin="57.208000" xMax="75.412000"
> >>>>>> 
> >>>>>> yMax="70.492000">The</word> </page>
> >>>>>> </doc>
> >>>>>> 
> >>>>>> I had a need, maybe others will too.
> >>>>>> 
> >>>>>> -KB
> >>>>> 
> >>>>> Why is this change necessary?
> >>>>> 
> >>>>> -  FILE *f;
> >>>>> +  FILE *f = stdout;
> >>>>> 
> >>>>> Albert
> >>> 
> >>> _______________________________________________
> >>> poppler mailing list
> >>> poppler at lists.freedesktop.org
> >>> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list