[poppler] commit? bounding box html in pdftotext

Kenneth Berland ken at hero.com
Wed Sep 22 11:18:57 PDT 2010


Yikes,

I have rewritten the replace function with standard C (and attached it 
this time.)

-KB

On Wed, 22 Sep 2010, Kenneth Berland wrote:

> I have rewritten the replace function with standard C.
>
> -KB
>
>
> On Sun, 11 Jul 2010, Albert Astals Cid wrote:
>
>> A Dimarts, 6 de juliol de 2010, Kenneth Berland va escriure:
>> 
>>> Can I use std::string within any GooString methods I write (e.g. replace) 
>>> or am I limited to the C Standard library (i.e. string.h)?
>> 
>> No std:: usage anywhere in poppler (except in the cpp frontend).
>> 
>> Albert
>> 
>> 
>> On Mon, 5 Jul 2010, Kenneth Berland wrote:
>> 
>>> Can I use std::string within any GooString methods I write (e.g. replace) 
>>> or am I limited to the C Standard library (i.e. string.h)?
>>> 
>>> -KB
>>> 
>>> 
>>> On Tue, 8 Jun 2010, Albert Astals Cid wrote:
>>> 
>>>> A Dimarts, 8 de juny de 2010, vàreu escriure:
>>>>> Does GooString have a replace() method?  I could not find one.  Does 
>>>>> this
>>>>> mean I should write one?
>>>> 
>>>> Yes, you'll have to write one or get the char * from the GooString and 
>>>> use c-
>>>> string ones.
>>>> 
>>>> Albert
>>>> 
>>>>> 
>>>>> -KB
>>>>> 
>>>>> On Sun, 30 May 2010, Albert Astals Cid wrote:
>>>>> > A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
>>>>> >> 1)  Since I sent my last diff, I've:
>>>>> >>  	a) added some string processing to make sure no HTML reserved
>>>>> >> >> characters are placed into the output.  I process each word.
>>>>> >> >>  	b) altered the html a bit so that XML parsers can deal with 
>>>>> it.
>>>>> >> >> I've put in a title tag or an empty title tag and added end tags 
>>>>> to the
>>>>> >> meta tags.
>>>>> >> >> 2)  Addressing your concerns:
>>>>> >>  	a) I've removed the initialization of stdout.
>>>>> >> >>  	b) I close f now and reopen it.  This also removes the 
>>>>> warning.
>>>>> >> >>  	c) If a user is running with the -bbox option, they want word
>>>>> >> >> bounding boxes.  If there are no words, I think a line to stderr 
>>>>> is
>>>>> >> appropriate.
>>>>> > > Cool, though we try not to use the std (yeah it sucks i know), can 
>>>>> you
>>>>> > either use GooString or char *?
>>>>> > > > Thanks,
>>>>> > >  Albert
>>>>> > >> -KB
>>>>> >> >> On Wed, 26 May 2010, Albert Astals Cid wrote:
>>>>> >>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
>>>>> >>>> I get a compiler warning without it.
>>>>> >>>> >>>> pdftotext.cc: In function ‘int main(int, char**)’:
>>>>> >>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in this
>>>>> >>>> function
>>>>> >>> >>> That change will not get accepted, sorry, initializing f to 
>>>>> stdout is
>>>>> >>> not a solution.
>>>>> >>> >>> Also i do not like the fact that you do not close f if you are 
>>>>> writing
>>>>> >>> the bbox? Can't you just open it again like the code already does?
>>>>> >>> >>> Also i do not understand why the code considers a page having no 
>>>>> text
>>>>> >>> an error.
>>>>> >>> >>> Albert
>>>>> >>> >>>> -KB
>>>>> >>>> >>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
>>>>> >>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
>>>>> >>>>>> List,
>>>>> >>>>>> >>>>>> I've attached a small addition to pdftotext that outputs 
>>>>> bounding
>>>>> >>>>>> box information to html like this:
>>>>> >>>>>> >>>>>> <doc>
>>>>> >>>>>> >>>>>>    <page width="612.000000" height="792.000000"/>
>>>>> >>>>>> >>>>>>      <word xMin="56.800000" yMin="57.208000" 
>>>>> xMax="75.412000"
>>>>> >>>>>> >>>>>> yMax="70.492000">The</word> </page>
>>>>> >>>>>> </doc>
>>>>> >>>>>> >>>>>> I had a need, maybe others will too.
>>>>> >>>>>> >>>>>> -KB
>>>>> >>>>> >>>>> Why is this change necessary?
>>>>> >>>>> >>>>> -  FILE *f;
>>>>> >>>>> +  FILE *f = stdout;
>>>>> >>>>> >>>>> Albert
>>>>> >>> >>> _______________________________________________
>>>>> >>> poppler mailing list
>>>>> >>> poppler at lists.freedesktop.org
>>>>> >>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: diff.txt
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100922/e9e41f0c/attachment.txt>


More information about the poppler mailing list