[poppler] commit? bounding box html in pdftotext
Albert Astals Cid
aacid at kde.org
Sat Sep 25 05:35:32 PDT 2010
A Dissabte, 25 de setembre de 2010, Kenneth Berland va escriure:
> Is this on-track for being committed?
Yes, it is, it's on the queue of lots of things of poppler related things i
have to do and will be commited when i find time for it.
Thanks for the patch and sorry for the delay.
Albert
>
> (sorry to bug you)
>
> -KB
>
> On Wed, 22 Sep 2010, Kenneth Berland wrote:
> > Very funny.
> >
> > The old diff, using std:: is at:
> >
> > http://lists.freedesktop.org/archives/poppler/attachments/20100530/898252
> > 75/attachment.txt
> >
> > You can commit either today's diff or the 2010-05-30 (std::) diff. I
> > think the std:: version is less likely to have pointer-related bugs.
> >
> > -KB
> >
> > On Wed, 22 Sep 2010, Albert Astals Cid wrote:
> >> A Dimecres, 22 de setembre de 2010, Kenneth Berland va escriure:
> >>> I have rewritten the replace function with standard C.
> >>
> >> Now is when you hate me but since a few weeks we accept std:: code if
> >> it's *obvious* it adds value over existing code.
> >>
> >> So can you please send again your patch to the mailing list?
> >>
> >> Sorry, totally forgot to tell you.
> >>
> >> Albert
> >>
> >>> -KB
> >>>
> >>> On Sun, 11 Jul 2010, Albert Astals Cid wrote:
> >>>> A Dimarts, 6 de juliol de 2010, Kenneth Berland va escriure:
> >>>>> Can I use std::string within any GooString methods I write (e.g.
> >>>>> replace) or am I limited to the C Standard library (i.e. string.h)?
> >>>>
> >>>> No std:: usage anywhere in poppler (except in the cpp frontend).
> >>>>
> >>>> Albert
> >>>>
> >>>> On Mon, 5 Jul 2010, Kenneth Berland wrote:
> >>>>> Can I use std::string within any GooString methods I write (e.g.
> >>>>> replace) or am I limited to the C Standard library (i.e. string.h)?
> >>>>>
> >>>>> -KB
> >>>>>
> >>>>> On Tue, 8 Jun 2010, Albert Astals Cid wrote:
> >>>>>> A Dimarts, 8 de juny de 2010, vàreu escriure:
> >>>>>>> Does GooString have a replace() method? I could not find one.
> >>>>>>> Does this mean I should write one?
> >>>>>>
> >>>>>> Yes, you'll have to write one or get the char * from the GooString
> >>>>>> and use c-
> >>>>>> string ones.
> >>>>>>
> >>>>>> Albert
> >>>>>>
> >>>>>>> -KB
> >>>>>>>
> >>>>>>> On Sun, 30 May 2010, Albert Astals Cid wrote:
> >>>>>>>> A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
> >>>>>>>>> 1) Since I sent my last diff, I've:
> >>>>>>>>> a) added some string processing to make sure no HTML reserved
> >>>>>>>>>
> >>>>>>>>>>> characters are placed into the output. I process each word.
> >>>>>>>>>>>
> >>>>>>>>>>> b) altered the html a bit so that XML parsers can deal with
> >>>>>>>
> >>>>>>> it.
> >>>>>>>
> >>>>>>>>>>> I've put in a title tag or an empty title tag and added end
> >>>>>>>>>>> tags to
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>>>> meta tags.
> >>>>>>>>>
> >>>>>>>>>>> 2) Addressing your concerns:
> >>>>>>>>> a) I've removed the initialization of stdout.
> >>>>>>>>>
> >>>>>>>>>>> b) I close f now and reopen it. This also removes the
> >>>>>>>
> >>>>>>> warning.
> >>>>>>>
> >>>>>>>>>>> c) If a user is running with the -bbox option, they want
> >>
> >> word
> >>
> >>>>>>>>>>> bounding boxes. If there are no words, I think a line to
> >>>>>>>>>>> stderr is
> >>>>>>>>>
> >>>>>>>>> appropriate.
> >>>>>>>>>
> >>>>>>>>> Cool, though we try not to use the std (yeah it sucks i know),
> >>>>>>>>> can
> >>>>>>>
> >>>>>>> you
> >>>>>>>
> >>>>>>>> either use GooString or char *?
> >>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>> Albert
> >>>>>>>>>
> >>>>>>>>>> -KB
> >>>>>>>>>>
> >>>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> >>>>>>>>>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
> >>>>>>>>>>> I get a compiler warning without it.
> >>>>>>>>>>>
> >>>>>>>>>>>>>>> pdftotext.cc: In function ‘int main(int, char**)’:
> >>>>>>>>>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in
> >>>>>>>>>>> this function
> >>>>>>>>>>>
> >>>>>>>>>>>>> That change will not get accepted, sorry, initializing f to
> >>>>>>>
> >>>>>>> stdout is
> >>>>>>>
> >>>>>>>>>> not a solution.
> >>>>>>>>>>
> >>>>>>>>>>>>> Also i do not like the fact that you do not close f if you
> >>>>>>>>>>>>> are
> >>>>>>>
> >>>>>>> writing
> >>>>>>>
> >>>>>>>>>> the bbox? Can't you just open it again like the code already
> >>>>>>>>>> does?
> >>>>>>>>>>
> >>>>>>>>>>>>> Also i do not understand why the code considers a page having
> >>>>>>>>>>>>> no
> >>>>>>>
> >>>>>>> text
> >>>>>>>
> >>>>>>>>>> an error.
> >>>>>>>>>>
> >>>>>>>>>>>>> Albert
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> -KB
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> >>>>>>>>>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
> >>>>>>>>>>>>> List,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I've attached a small addition to pdftotext that
> >>>>>>>>>>>>>>>>>>> outputs
> >>>>>>>
> >>>>>>> bounding
> >>>>>>>
> >>>>>>>>>>>>> box information to html like this:
> >>>>>>>>>>>>>>>>>>> <doc>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> <page width="612.000000" height="792.000000"/>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> <word xMin="56.800000" yMin="57.208000"
> >>>>>>>
> >>>>>>> xMax="75.412000"
> >>>>>>>
> >>>>>>>>>>>>>>>>>>> yMax="70.492000">The</word> </page>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> </doc>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I had a need, maybe others will too.
> >>>>>>>>>>>>>>>>>>> -KB
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Why is this change necessary?
> >>>>>>>>>>>>>>>>> - FILE *f;
> >>>>>>>>>>>>
> >>>>>>>>>>>> + FILE *f = stdout;
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Albert
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>
> >>>>>>>>>> poppler mailing list
> >>>>>>>>>> poppler at lists.freedesktop.org
> >>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> poppler mailing list
> >>>>>> poppler at lists.freedesktop.org
> >>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list