[poppler] commit? bounding box html in pdftotext

Kenneth Berland ken at hero.com
Wed Sep 29 15:19:54 PDT 2010


There exist OS's without /dev/null!?

;)

You're pretty good at this code review thing, nice catch.  NULL appears to 
be what TextOutputDev expects.  Updated patch attached.

-KB


On Wed, 29 Sep 2010, Albert Astals Cid wrote:

> A Dimecres, 22 de setembre de 2010, vàreu escriure:
>> Very funny.
>> 
>> The old diff, using std:: is at:
>> 
>> http://lists.freedesktop.org/archives/poppler/attachments/20100530/89825275
>> /attachment.txt
>> 
>> You can commit either today's diff or the 2010-05-30 (std::) diff.  I
>> think the std:: version is less likely to have pointer-related bugs.
>
> textOut = new TextOutputDev("/dev/null",
>
> This doesn't look portable. Can you please fix it?
>
> Thanks,
>  Albert
>
>> 
>> -KB
>> 
>> On Wed, 22 Sep 2010, Albert Astals Cid wrote:
>> > A Dimecres, 22 de setembre de 2010, Kenneth Berland va escriure:
>> >> I have rewritten the replace function with standard C.
>> > 
>> > Now is when you hate me but since a few weeks we accept std:: code if
>> > it's *obvious* it adds value over existing code.
>> > 
>> > So can you please send again your patch to the mailing list?
>> > 
>> > Sorry, totally forgot to tell you.
>> > 
>> > Albert
>> > 
>> >> -KB
>> >> 
>> >> On Sun, 11 Jul 2010, Albert Astals Cid wrote:
>> >>> A Dimarts, 6 de juliol de 2010, Kenneth Berland va escriure:
>> >>>> Can I use std::string within any GooString methods I write (e.g.
>> >>>> replace) or am I limited to the C Standard library (i.e. string.h)?
>> >>> 
>> >>> No std:: usage anywhere in poppler (except in the cpp frontend).
>> >>> 
>> >>> Albert
>> >>> 
>> >>> On Mon, 5 Jul 2010, Kenneth Berland wrote:
>> >>>> Can I use std::string within any GooString methods I write (e.g.
>> >>>> replace) or am I limited to the C Standard library (i.e. string.h)?
>> >>>> 
>> >>>> -KB
>> >>>> 
>> >>>> On Tue, 8 Jun 2010, Albert Astals Cid wrote:
>> >>>>> A Dimarts, 8 de juny de 2010, vàreu escriure:
>> >>>>>> Does GooString have a replace() method?  I could not find one.  Does
>> >>>>>> this mean I should write one?
>> >>>>> 
>> >>>>> Yes, you'll have to write one or get the char * from the GooString
>> >>>>> and use c-
>> >>>>> string ones.
>> >>>>> 
>> >>>>> Albert
>> >>>>> 
>> >>>>>> -KB
>> >>>>>> 
>> >>>>>> On Sun, 30 May 2010, Albert Astals Cid wrote:
>> >>>>>>> A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
>> >>>>>>>> 1)  Since I sent my last diff, I've:
>> >>>>>>>>  	a) added some string processing to make sure no HTML reserved
>> >>>>>>>> 
>> >>>>>>>>>> characters are placed into the output.  I process each word.
>> >>>>>>>>>> 
>> >>>>>>>>>>  	b) altered the html a bit so that XML parsers can deal with
>> >>>>>> 
>> >>>>>> it.
>> >>>>>> 
>> >>>>>>>>>> I've put in a title tag or an empty title tag and added end tags
>> >>>>>>>>>> to
>> >>>>>> 
>> >>>>>> the
>> >>>>>> 
>> >>>>>>>> meta tags.
>> >>>>>>>> 
>> >>>>>>>>>> 2)  Addressing your concerns:
>> >>>>>>>>  	a) I've removed the initialization of stdout.
>> >>>>>>>> 
>> >>>>>>>>>>  	b) I close f now and reopen it.  This also removes the
>> >>>>>> 
>> >>>>>> warning.
>> >>>>>> 
>> >>>>>>>>>>  	c) If a user is running with the -bbox option, they want
>> > 
>> > word
>> > 
>> >>>>>>>>>> bounding boxes.  If there are no words, I think a line to stderr
>> >>>>>>>>>> is
>> >>>>>>>> 
>> >>>>>>>> appropriate.
>> >>>>>>>> 
>> >>>>>>>> Cool, though we try not to use the std (yeah it sucks i know), can
>> >>>>>> 
>> >>>>>> you
>> >>>>>> 
>> >>>>>>> either use GooString or char *?
>> >>>>>>> 
>> >>>>>>>>> Thanks,
>> >>>>>>>>> 
>> >>>>>>>>  Albert
>> >>>>>>>> 
>> >>>>>>>>> -KB
>> >>>>>>>>> 
>> >>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>> >>>>>>>>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
>> >>>>>>>>>> I get a compiler warning without it.
>> >>>>>>>>>> 
>> >>>>>>>>>>>>>> pdftotext.cc: In function ‘int main(int, char**)’:
>> >>>>>>>>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in this
>> >>>>>>>>>> function
>> >>>>>>>>>> 
>> >>>>>>>>>>>> That change will not get accepted, sorry, initializing f to
>> >>>>>> 
>> >>>>>> stdout is
>> >>>>>> 
>> >>>>>>>>> not a solution.
>> >>>>>>>>> 
>> >>>>>>>>>>>> Also i do not like the fact that you do not close f if you are
>> >>>>>> 
>> >>>>>> writing
>> >>>>>> 
>> >>>>>>>>> the bbox? Can't you just open it again like the code already
>> >>>>>>>>> does?
>> >>>>>>>>> 
>> >>>>>>>>>>>> Also i do not understand why the code considers a page having
>> >>>>>>>>>>>> no
>> >>>>>> 
>> >>>>>> text
>> >>>>>> 
>> >>>>>>>>> an error.
>> >>>>>>>>> 
>> >>>>>>>>>>>> Albert
>> >>>>>>>>>>>> 
>> >>>>>>>>>>>>> -KB
>> >>>>>>>>>>>>> 
>> >>>>>>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>> >>>>>>>>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
>> >>>>>>>>>>>> List,
>> >>>>>>>>>>>> 
>> >>>>>>>>>>>>>>>>>> I've attached a small addition to pdftotext that outputs
>> >>>>>> 
>> >>>>>> bounding
>> >>>>>> 
>> >>>>>>>>>>>> box information to html like this:
>> >>>>>>>>>>>>>>>>>> <doc>
>> >>>>>>>>>>>>>>>>>> 
>> >>>>>>>>>>>>>>>>>>    <page width="612.000000" height="792.000000"/>
>> >>>>>>>>>>>>>>>>>> 
>> >>>>>>>>>>>>>>>>>>      <word xMin="56.800000" yMin="57.208000"
>> >>>>>> 
>> >>>>>> xMax="75.412000"
>> >>>>>> 
>> >>>>>>>>>>>>>>>>>> yMax="70.492000">The</word> </page>
>> >>>>>>>>>>>> 
>> >>>>>>>>>>>> </doc>
>> >>>>>>>>>>>> 
>> >>>>>>>>>>>>>>>>>> I had a need, maybe others will too.
>> >>>>>>>>>>>>>>>>>> -KB
>> >>>>>>>>>>>>>>>> 
>> >>>>>>>>>>>>>>>> Why is this change necessary?
>> >>>>>>>>>>>>>>>> -  FILE *f;
>> >>>>>>>>>>> 
>> >>>>>>>>>>> +  FILE *f = stdout;
>> >>>>>>>>>>> 
>> >>>>>>>>>>>>>>>> Albert
>> >>>>>>>>>>>> 
>> >>>>>>>>>>>> _______________________________________________
>> >>>>>>>>> 
>> >>>>>>>>> poppler mailing list
>> >>>>>>>>> poppler at lists.freedesktop.org
>> >>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>> >>>>> 
>> >>>>> _______________________________________________
>> >>>>> poppler mailing list
>> >>>>> poppler at lists.freedesktop.org
>> >>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch.txt
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100929/e47c7286/attachment-0001.txt>


More information about the poppler mailing list