[poppler] commit? bounding box html in pdftotext

Albert Astals Cid aacid at kde.org
Wed Sep 29 14:41:25 PDT 2010


A Dimecres, 22 de setembre de 2010, vàreu escriure:
> Very funny.
> 
> The old diff, using std:: is at:
> 
> http://lists.freedesktop.org/archives/poppler/attachments/20100530/89825275
> /attachment.txt
> 
> You can commit either today's diff or the 2010-05-30 (std::) diff.  I
> think the std:: version is less likely to have pointer-related bugs.

textOut = new TextOutputDev("/dev/null",

This doesn't look portable. Can you please fix it?

Thanks,
  Albert

> 
> -KB
> 
> On Wed, 22 Sep 2010, Albert Astals Cid wrote:
> > A Dimecres, 22 de setembre de 2010, Kenneth Berland va escriure:
> >> I have rewritten the replace function with standard C.
> > 
> > Now is when you hate me but since a few weeks we accept std:: code if
> > it's *obvious* it adds value over existing code.
> > 
> > So can you please send again your patch to the mailing list?
> > 
> > Sorry, totally forgot to tell you.
> > 
> > Albert
> > 
> >> -KB
> >> 
> >> On Sun, 11 Jul 2010, Albert Astals Cid wrote:
> >>> A Dimarts, 6 de juliol de 2010, Kenneth Berland va escriure:
> >>>> Can I use std::string within any GooString methods I write (e.g.
> >>>> replace) or am I limited to the C Standard library (i.e. string.h)?
> >>> 
> >>> No std:: usage anywhere in poppler (except in the cpp frontend).
> >>> 
> >>> Albert
> >>> 
> >>> On Mon, 5 Jul 2010, Kenneth Berland wrote:
> >>>> Can I use std::string within any GooString methods I write (e.g.
> >>>> replace) or am I limited to the C Standard library (i.e. string.h)?
> >>>> 
> >>>> -KB
> >>>> 
> >>>> On Tue, 8 Jun 2010, Albert Astals Cid wrote:
> >>>>> A Dimarts, 8 de juny de 2010, vàreu escriure:
> >>>>>> Does GooString have a replace() method?  I could not find one.  Does
> >>>>>> this mean I should write one?
> >>>>> 
> >>>>> Yes, you'll have to write one or get the char * from the GooString
> >>>>> and use c-
> >>>>> string ones.
> >>>>> 
> >>>>> Albert
> >>>>> 
> >>>>>> -KB
> >>>>>> 
> >>>>>> On Sun, 30 May 2010, Albert Astals Cid wrote:
> >>>>>>> A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
> >>>>>>>> 1)  Since I sent my last diff, I've:
> >>>>>>>>  	a) added some string processing to make sure no HTML reserved
> >>>>>>>>  	
> >>>>>>>>>> characters are placed into the output.  I process each word.
> >>>>>>>>>> 
> >>>>>>>>>>  	b) altered the html a bit so that XML parsers can deal with
> >>>>>> 
> >>>>>> it.
> >>>>>> 
> >>>>>>>>>> I've put in a title tag or an empty title tag and added end tags
> >>>>>>>>>> to
> >>>>>> 
> >>>>>> the
> >>>>>> 
> >>>>>>>> meta tags.
> >>>>>>>> 
> >>>>>>>>>> 2)  Addressing your concerns:
> >>>>>>>>  	a) I've removed the initialization of stdout.
> >>>>>>>>  	
> >>>>>>>>>>  	b) I close f now and reopen it.  This also removes the
> >>>>>> 
> >>>>>> warning.
> >>>>>> 
> >>>>>>>>>>  	c) If a user is running with the -bbox option, they want
> > 
> > word
> > 
> >>>>>>>>>> bounding boxes.  If there are no words, I think a line to stderr
> >>>>>>>>>> is
> >>>>>>>> 
> >>>>>>>> appropriate.
> >>>>>>>> 
> >>>>>>>> Cool, though we try not to use the std (yeah it sucks i know), can
> >>>>>> 
> >>>>>> you
> >>>>>> 
> >>>>>>> either use GooString or char *?
> >>>>>>> 
> >>>>>>>>> Thanks,
> >>>>>>>>> 
> >>>>>>>>  Albert
> >>>>>>>>  
> >>>>>>>>> -KB
> >>>>>>>>> 
> >>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> >>>>>>>>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
> >>>>>>>>>> I get a compiler warning without it.
> >>>>>>>>>> 
> >>>>>>>>>>>>>> pdftotext.cc: In function ‘int main(int, char**)’:
> >>>>>>>>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in this
> >>>>>>>>>> function
> >>>>>>>>>> 
> >>>>>>>>>>>> That change will not get accepted, sorry, initializing f to
> >>>>>> 
> >>>>>> stdout is
> >>>>>> 
> >>>>>>>>> not a solution.
> >>>>>>>>> 
> >>>>>>>>>>>> Also i do not like the fact that you do not close f if you are
> >>>>>> 
> >>>>>> writing
> >>>>>> 
> >>>>>>>>> the bbox? Can't you just open it again like the code already
> >>>>>>>>> does?
> >>>>>>>>> 
> >>>>>>>>>>>> Also i do not understand why the code considers a page having
> >>>>>>>>>>>> no
> >>>>>> 
> >>>>>> text
> >>>>>> 
> >>>>>>>>> an error.
> >>>>>>>>> 
> >>>>>>>>>>>> Albert
> >>>>>>>>>>>> 
> >>>>>>>>>>>>> -KB
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> >>>>>>>>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
> >>>>>>>>>>>> List,
> >>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> I've attached a small addition to pdftotext that outputs
> >>>>>> 
> >>>>>> bounding
> >>>>>> 
> >>>>>>>>>>>> box information to html like this:
> >>>>>>>>>>>>>>>>>> <doc>
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>    <page width="612.000000" height="792.000000"/>
> >>>>>>>>>>>>>>>>>>    
> >>>>>>>>>>>>>>>>>>      <word xMin="56.800000" yMin="57.208000"
> >>>>>> 
> >>>>>> xMax="75.412000"
> >>>>>> 
> >>>>>>>>>>>>>>>>>> yMax="70.492000">The</word> </page>
> >>>>>>>>>>>> 
> >>>>>>>>>>>> </doc>
> >>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> I had a need, maybe others will too.
> >>>>>>>>>>>>>>>>>> -KB
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Why is this change necessary?
> >>>>>>>>>>>>>>>> -  FILE *f;
> >>>>>>>>>>> 
> >>>>>>>>>>> +  FILE *f = stdout;
> >>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Albert
> >>>>>>>>>>>> 
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>> 
> >>>>>>>>> poppler mailing list
> >>>>>>>>> poppler at lists.freedesktop.org
> >>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> >>>>> 
> >>>>> _______________________________________________
> >>>>> poppler mailing list
> >>>>> poppler at lists.freedesktop.org
> >>>>> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list