[poppler] commit? bounding box html in pdftotext

Kenneth Berland ken at hero.com
Wed Sep 29 18:53:28 PDT 2010


Attached.

-KB


On Thu, 30 Sep 2010, Albert Astals Cid wrote:

> A Dimecres, 29 de setembre de 2010, Kenneth Berland va escriure:
>> There exist OS's without /dev/null!?
>> 
>> ;)
>> 
>> You're pretty good at this code review thing, nice catch.  NULL appears to
>> be what TextOutputDev expects.  Updated patch attached.
>
> Could you also please add the new option to pdftohtml.1 man page?
>
> Thanks,
>  Albert
>
>> 
>> -KB
>> 
>> On Wed, 29 Sep 2010, Albert Astals Cid wrote:
>> > A Dimecres, 22 de setembre de 2010, vàreu escriure:
>> >> Very funny.
>> >> 
>> >> The old diff, using std:: is at:
>> >> 
>> >> http://lists.freedesktop.org/archives/poppler/attachments/20100530/89825
>> >> 275 /attachment.txt
>> >> 
>> >> You can commit either today's diff or the 2010-05-30 (std::) diff.  I
>> >> think the std:: version is less likely to have pointer-related bugs.
>> > 
>> > textOut = new TextOutputDev("/dev/null",
>> > 
>> > This doesn't look portable. Can you please fix it?
>> > 
>> > Thanks,
>> > 
>> >  Albert
>> > 
>> >> -KB
>> >> 
>> >> On Wed, 22 Sep 2010, Albert Astals Cid wrote:
>> >> > A Dimecres, 22 de setembre de 2010, Kenneth Berland va escriure:
>> >> >> I have rewritten the replace function with standard C.
>> >> > 
>> >> > Now is when you hate me but since a few weeks we accept std:: code if
>> >> > it's *obvious* it adds value over existing code.
>> >> > 
>> >> > So can you please send again your patch to the mailing list?
>> >> > 
>> >> > Sorry, totally forgot to tell you.
>> >> > 
>> >> > Albert
>> >> > 
>> >> >> -KB
>> >> >> 
>> >> >> On Sun, 11 Jul 2010, Albert Astals Cid wrote:
>> >> >>> A Dimarts, 6 de juliol de 2010, Kenneth Berland va escriure:
>> >> >>>> Can I use std::string within any GooString methods I write (e.g.
>> >> >>>> replace) or am I limited to the C Standard library (i.e. string.h)?
>> >> >>> 
>> >> >>> No std:: usage anywhere in poppler (except in the cpp frontend).
>> >> >>> 
>> >> >>> Albert
>> >> >>> 
>> >> >>> On Mon, 5 Jul 2010, Kenneth Berland wrote:
>> >> >>>> Can I use std::string within any GooString methods I write (e.g.
>> >> >>>> replace) or am I limited to the C Standard library (i.e. string.h)?
>> >> >>>> 
>> >> >>>> -KB
>> >> >>>> 
>> >> >>>> On Tue, 8 Jun 2010, Albert Astals Cid wrote:
>> >> >>>>> A Dimarts, 8 de juny de 2010, vàreu escriure:
>> >> >>>>>> Does GooString have a replace() method?  I could not find one. 
>> >> >>>>>> Does this mean I should write one?
>> >> >>>>> 
>> >> >>>>> Yes, you'll have to write one or get the char * from the GooString
>> >> >>>>> and use c-
>> >> >>>>> string ones.
>> >> >>>>> 
>> >> >>>>> Albert
>> >> >>>>> 
>> >> >>>>>> -KB
>> >> >>>>>> 
>> >> >>>>>> On Sun, 30 May 2010, Albert Astals Cid wrote:
>> >> >>>>>>> A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
>> >> >>>>>>>> 1)  Since I sent my last diff, I've:
>> >> >>>>>>>>  	a) added some string processing to make sure no HTML 
> reserved
>> >> >>>>>>>> 
>> >> >>>>>>>>>> characters are placed into the output.  I process each word.
>> >> >>>>>>>>>> 
>> >> >>>>>>>>>>  	b) altered the html a bit so that XML parsers can deal with
>> >> >>>>>> 
>> >> >>>>>> it.
>> >> >>>>>> 
>> >> >>>>>>>>>> I've put in a title tag or an empty title tag and added end
>> >> >>>>>>>>>> tags to
>> >> >>>>>> 
>> >> >>>>>> the
>> >> >>>>>> 
>> >> >>>>>>>> meta tags.
>> >> >>>>>>>> 
>> >> >>>>>>>>>> 2)  Addressing your concerns:
>> >> >>>>>>>>  	a) I've removed the initialization of stdout.
>> >> >>>>>>>> 
>> >> >>>>>>>>>>  	b) I close f now and reopen it.  This also removes the
>> >> >>>>>> 
>> >> >>>>>> warning.
>> >> >>>>>> 
>> >> >>>>>>>>>>  	c) If a user is running with the -bbox option, they want
>> >> > 
>> >> > word
>> >> > 
>> >> >>>>>>>>>> bounding boxes.  If there are no words, I think a line to
>> >> >>>>>>>>>> stderr is
>> >> >>>>>>>> 
>> >> >>>>>>>> appropriate.
>> >> >>>>>>>> 
>> >> >>>>>>>> Cool, though we try not to use the std (yeah it sucks i know),
>> >> >>>>>>>> can
>> >> >>>>>> 
>> >> >>>>>> you
>> >> >>>>>> 
>> >> >>>>>>> either use GooString or char *?
>> >> >>>>>>> 
>> >> >>>>>>>>> Thanks,
>> >> >>>>>>>>> 
>> >> >>>>>>>>  Albert
>> >> >>>>>>>> 
>> >> >>>>>>>>> -KB
>> >> >>>>>>>>> 
>> >> >>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>> >> >>>>>>>>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
>> >> >>>>>>>>>> I get a compiler warning without it.
>> >> >>>>>>>>>> 
>> >> >>>>>>>>>>>>>> pdftotext.cc: In function ‘int main(int, char**)’:
>> >> >>>>>>>>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in
>> >> >>>>>>>>>> this function
>> >> >>>>>>>>>> 
>> >> >>>>>>>>>>>> That change will not get accepted, sorry, initializing f to
>> >> >>>>>> 
>> >> >>>>>> stdout is
>> >> >>>>>> 
>> >> >>>>>>>>> not a solution.
>> >> >>>>>>>>> 
>> >> >>>>>>>>>>>> Also i do not like the fact that you do not close f if you
>> >> >>>>>>>>>>>> are
>> >> >>>>>> 
>> >> >>>>>> writing
>> >> >>>>>> 
>> >> >>>>>>>>> the bbox? Can't you just open it again like the code already
>> >> >>>>>>>>> does?
>> >> >>>>>>>>> 
>> >> >>>>>>>>>>>> Also i do not understand why the code considers a page
>> >> >>>>>>>>>>>> having no
>> >> >>>>>> 
>> >> >>>>>> text
>> >> >>>>>> 
>> >> >>>>>>>>> an error.
>> >> >>>>>>>>> 
>> >> >>>>>>>>>>>> Albert
>> >> >>>>>>>>>>>> 
>> >> >>>>>>>>>>>>> -KB
>> >> >>>>>>>>>>>>> 
>> >> >>>>>>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>> >> >>>>>>>>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
>> >> >>>>>>>>>>>> List,
>> >> >>>>>>>>>>>> 
>> >> >>>>>>>>>>>>>>>>>> I've attached a small addition to pdftotext that
>> >> >>>>>>>>>>>>>>>>>> outputs
>> >> >>>>>> 
>> >> >>>>>> bounding
>> >> >>>>>> 
>> >> >>>>>>>>>>>> box information to html like this:
>> >> >>>>>>>>>>>>>>>>>> <doc>
>> >> >>>>>>>>>>>>>>>>>> 
>> >> >>>>>>>>>>>>>>>>>>    <page width="612.000000" height="792.000000"/>
>> >> >>>>>>>>>>>>>>>>>> 
>> >> >>>>>>>>>>>>>>>>>>      <word xMin="56.800000" yMin="57.208000"
>> >> >>>>>> 
>> >> >>>>>> xMax="75.412000"
>> >> >>>>>> 
>> >> >>>>>>>>>>>>>>>>>> yMax="70.492000">The</word> </page>
>> >> >>>>>>>>>>>> 
>> >> >>>>>>>>>>>> </doc>
>> >> >>>>>>>>>>>> 
>> >> >>>>>>>>>>>>>>>>>> I had a need, maybe others will too.
>> >> >>>>>>>>>>>>>>>>>> -KB
>> >> >>>>>>>>>>>>>>>> 
>> >> >>>>>>>>>>>>>>>> Why is this change necessary?
>> >> >>>>>>>>>>>>>>>> -  FILE *f;
>> >> >>>>>>>>>>> 
>> >> >>>>>>>>>>> +  FILE *f = stdout;
>> >> >>>>>>>>>>> 
>> >> >>>>>>>>>>>>>>>> Albert
>> >> >>>>>>>>>>>> 
>> >> >>>>>>>>>>>> _______________________________________________
>> >> >>>>>>>>> 
>> >> >>>>>>>>> poppler mailing list
>> >> >>>>>>>>> poppler at lists.freedesktop.org
>> >> >>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>> >> >>>>> 
>> >> >>>>> _______________________________________________
>> >> >>>>> poppler mailing list
>> >> >>>>> poppler at lists.freedesktop.org
>> >> >>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>> > 
>> > _______________________________________________
>> > poppler mailing list
>> > poppler at lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch.txt
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100929/a8a0957f/attachment.txt>


More information about the poppler mailing list