[poppler] commit? bounding box html in pdftotext

Albert Astals Cid aacid at kde.org
Wed Sep 29 16:22:34 PDT 2010


A Dimecres, 29 de setembre de 2010, Kenneth Berland va escriure:
> There exist OS's without /dev/null!?
> 
> ;)
> 
> You're pretty good at this code review thing, nice catch.  NULL appears to
> be what TextOutputDev expects.  Updated patch attached.

Could you also please add the new option to pdftohtml.1 man page?

Thanks,
  Albert

> 
> -KB
> 
> On Wed, 29 Sep 2010, Albert Astals Cid wrote:
> > A Dimecres, 22 de setembre de 2010, vàreu escriure:
> >> Very funny.
> >> 
> >> The old diff, using std:: is at:
> >> 
> >> http://lists.freedesktop.org/archives/poppler/attachments/20100530/89825
> >> 275 /attachment.txt
> >> 
> >> You can commit either today's diff or the 2010-05-30 (std::) diff.  I
> >> think the std:: version is less likely to have pointer-related bugs.
> > 
> > textOut = new TextOutputDev("/dev/null",
> > 
> > This doesn't look portable. Can you please fix it?
> > 
> > Thanks,
> > 
> >  Albert
> >  
> >> -KB
> >> 
> >> On Wed, 22 Sep 2010, Albert Astals Cid wrote:
> >> > A Dimecres, 22 de setembre de 2010, Kenneth Berland va escriure:
> >> >> I have rewritten the replace function with standard C.
> >> > 
> >> > Now is when you hate me but since a few weeks we accept std:: code if
> >> > it's *obvious* it adds value over existing code.
> >> > 
> >> > So can you please send again your patch to the mailing list?
> >> > 
> >> > Sorry, totally forgot to tell you.
> >> > 
> >> > Albert
> >> > 
> >> >> -KB
> >> >> 
> >> >> On Sun, 11 Jul 2010, Albert Astals Cid wrote:
> >> >>> A Dimarts, 6 de juliol de 2010, Kenneth Berland va escriure:
> >> >>>> Can I use std::string within any GooString methods I write (e.g.
> >> >>>> replace) or am I limited to the C Standard library (i.e. string.h)?
> >> >>> 
> >> >>> No std:: usage anywhere in poppler (except in the cpp frontend).
> >> >>> 
> >> >>> Albert
> >> >>> 
> >> >>> On Mon, 5 Jul 2010, Kenneth Berland wrote:
> >> >>>> Can I use std::string within any GooString methods I write (e.g.
> >> >>>> replace) or am I limited to the C Standard library (i.e. string.h)?
> >> >>>> 
> >> >>>> -KB
> >> >>>> 
> >> >>>> On Tue, 8 Jun 2010, Albert Astals Cid wrote:
> >> >>>>> A Dimarts, 8 de juny de 2010, vàreu escriure:
> >> >>>>>> Does GooString have a replace() method?  I could not find one. 
> >> >>>>>> Does this mean I should write one?
> >> >>>>> 
> >> >>>>> Yes, you'll have to write one or get the char * from the GooString
> >> >>>>> and use c-
> >> >>>>> string ones.
> >> >>>>> 
> >> >>>>> Albert
> >> >>>>> 
> >> >>>>>> -KB
> >> >>>>>> 
> >> >>>>>> On Sun, 30 May 2010, Albert Astals Cid wrote:
> >> >>>>>>> A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
> >> >>>>>>>> 1)  Since I sent my last diff, I've:
> >> >>>>>>>>  	a) added some string processing to make sure no HTML 
reserved
> >> >>>>>>>>  	
> >> >>>>>>>>>> characters are placed into the output.  I process each word.
> >> >>>>>>>>>> 
> >> >>>>>>>>>>  	b) altered the html a bit so that XML parsers can deal with
> >> >>>>>> 
> >> >>>>>> it.
> >> >>>>>> 
> >> >>>>>>>>>> I've put in a title tag or an empty title tag and added end
> >> >>>>>>>>>> tags to
> >> >>>>>> 
> >> >>>>>> the
> >> >>>>>> 
> >> >>>>>>>> meta tags.
> >> >>>>>>>> 
> >> >>>>>>>>>> 2)  Addressing your concerns:
> >> >>>>>>>>  	a) I've removed the initialization of stdout.
> >> >>>>>>>>  	
> >> >>>>>>>>>>  	b) I close f now and reopen it.  This also removes the
> >> >>>>>> 
> >> >>>>>> warning.
> >> >>>>>> 
> >> >>>>>>>>>>  	c) If a user is running with the -bbox option, they want
> >> > 
> >> > word
> >> > 
> >> >>>>>>>>>> bounding boxes.  If there are no words, I think a line to
> >> >>>>>>>>>> stderr is
> >> >>>>>>>> 
> >> >>>>>>>> appropriate.
> >> >>>>>>>> 
> >> >>>>>>>> Cool, though we try not to use the std (yeah it sucks i know),
> >> >>>>>>>> can
> >> >>>>>> 
> >> >>>>>> you
> >> >>>>>> 
> >> >>>>>>> either use GooString or char *?
> >> >>>>>>> 
> >> >>>>>>>>> Thanks,
> >> >>>>>>>>> 
> >> >>>>>>>>  Albert
> >> >>>>>>>>  
> >> >>>>>>>>> -KB
> >> >>>>>>>>> 
> >> >>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> >> >>>>>>>>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
> >> >>>>>>>>>> I get a compiler warning without it.
> >> >>>>>>>>>> 
> >> >>>>>>>>>>>>>> pdftotext.cc: In function ‘int main(int, char**)’:
> >> >>>>>>>>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in
> >> >>>>>>>>>> this function
> >> >>>>>>>>>> 
> >> >>>>>>>>>>>> That change will not get accepted, sorry, initializing f to
> >> >>>>>> 
> >> >>>>>> stdout is
> >> >>>>>> 
> >> >>>>>>>>> not a solution.
> >> >>>>>>>>> 
> >> >>>>>>>>>>>> Also i do not like the fact that you do not close f if you
> >> >>>>>>>>>>>> are
> >> >>>>>> 
> >> >>>>>> writing
> >> >>>>>> 
> >> >>>>>>>>> the bbox? Can't you just open it again like the code already
> >> >>>>>>>>> does?
> >> >>>>>>>>> 
> >> >>>>>>>>>>>> Also i do not understand why the code considers a page
> >> >>>>>>>>>>>> having no
> >> >>>>>> 
> >> >>>>>> text
> >> >>>>>> 
> >> >>>>>>>>> an error.
> >> >>>>>>>>> 
> >> >>>>>>>>>>>> Albert
> >> >>>>>>>>>>>> 
> >> >>>>>>>>>>>>> -KB
> >> >>>>>>>>>>>>> 
> >> >>>>>>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> >> >>>>>>>>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
> >> >>>>>>>>>>>> List,
> >> >>>>>>>>>>>> 
> >> >>>>>>>>>>>>>>>>>> I've attached a small addition to pdftotext that
> >> >>>>>>>>>>>>>>>>>> outputs
> >> >>>>>> 
> >> >>>>>> bounding
> >> >>>>>> 
> >> >>>>>>>>>>>> box information to html like this:
> >> >>>>>>>>>>>>>>>>>> <doc>
> >> >>>>>>>>>>>>>>>>>> 
> >> >>>>>>>>>>>>>>>>>>    <page width="612.000000" height="792.000000"/>
> >> >>>>>>>>>>>>>>>>>>    
> >> >>>>>>>>>>>>>>>>>>      <word xMin="56.800000" yMin="57.208000"
> >> >>>>>> 
> >> >>>>>> xMax="75.412000"
> >> >>>>>> 
> >> >>>>>>>>>>>>>>>>>> yMax="70.492000">The</word> </page>
> >> >>>>>>>>>>>> 
> >> >>>>>>>>>>>> </doc>
> >> >>>>>>>>>>>> 
> >> >>>>>>>>>>>>>>>>>> I had a need, maybe others will too.
> >> >>>>>>>>>>>>>>>>>> -KB
> >> >>>>>>>>>>>>>>>> 
> >> >>>>>>>>>>>>>>>> Why is this change necessary?
> >> >>>>>>>>>>>>>>>> -  FILE *f;
> >> >>>>>>>>>>> 
> >> >>>>>>>>>>> +  FILE *f = stdout;
> >> >>>>>>>>>>> 
> >> >>>>>>>>>>>>>>>> Albert
> >> >>>>>>>>>>>> 
> >> >>>>>>>>>>>> _______________________________________________
> >> >>>>>>>>> 
> >> >>>>>>>>> poppler mailing list
> >> >>>>>>>>> poppler at lists.freedesktop.org
> >> >>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> >> >>>>> 
> >> >>>>> _______________________________________________
> >> >>>>> poppler mailing list
> >> >>>>> poppler at lists.freedesktop.org
> >> >>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list