[poppler] commit? bounding box html in pdftotext

Kenneth Berland ken at hero.com
Tue Jun 8 12:58:38 PDT 2010


Does GooString have a replace() method?  I could not find one.  Does this 
mean I should write one?

-KB


On Sun, 30 May 2010, Albert Astals Cid wrote:

> A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
>> 1)  Since I sent my last diff, I've:
>>
>>  	a) added some string processing to make sure no HTML reserved
>> characters are placed into the output.  I process each word.
>>  	b) altered the html a bit so that XML parsers can deal with it.
>> I've put in a title tag or an empty title tag and added end tags to the
>> meta tags.
>>
>> 2)  Addressing your concerns:
>>
>>  	a) I've removed the initialization of stdout.
>>
>>  	b) I close f now and reopen it.  This also removes the warning.
>>
>>  	c) If a user is running with the -bbox option, they want word
>> bounding boxes.  If there are no words, I think a line to stderr is
>> appropriate.
>
> Cool, though we try not to use the std (yeah it sucks i know), can you either
> use GooString or char *?
>
>
> Thanks,
>  Albert
>
>
>>
>> -KB
>>
>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
>>>> I get a compiler warning without it.
>>>>
>>>> pdftotext.cc: In function ‘int main(int, char**)’:
>>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in this
>>>> function
>>>
>>> That change will not get accepted, sorry, initializing f to stdout is not
>>> a solution.
>>>
>>> Also i do not like the fact that you do not close f if you are writing
>>> the bbox? Can't you just open it again like the code already does?
>>>
>>> Also i do not understand why the code considers a page having no text an
>>> error.
>>>
>>> Albert
>>>
>>>> -KB
>>>>
>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
>>>>>> List,
>>>>>>
>>>>>> I've attached a small addition to pdftotext that outputs bounding box
>>>>>> information to html like this:
>>>>>>
>>>>>> <doc>
>>>>>>
>>>>>>    <page width="612.000000" height="792.000000"/>
>>>>>>
>>>>>>      <word xMin="56.800000" yMin="57.208000" xMax="75.412000"
>>>>>>
>>>>>> yMax="70.492000">The</word> </page>
>>>>>> </doc>
>>>>>>
>>>>>> I had a need, maybe others will too.
>>>>>>
>>>>>> -KB
>>>>>
>>>>> Why is this change necessary?
>>>>>
>>>>> -  FILE *f;
>>>>> +  FILE *f = stdout;
>>>>>
>>>>> Albert
>>>
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>


More information about the poppler mailing list