[poppler] commit? bounding box html in pdftotext
Albert Astals Cid
aacid at kde.org
Sun Oct 17 06:24:59 PDT 2010
A Diumenge, 17 d'octubre de 2010, Albert Astals Cid va escriure:
> A Dimecres, 29 de setembre de 2010, Kenneth Berland va escriure:
> > There exist OS's without /dev/null!?
> >
> > ;)
> >
> > You're pretty good at this code review thing, nice catch. NULL appears
> > to be what TextOutputDev expects. Updated patch attached.
>
> Did this ever work?
>
> string myString = myXmlTokenReplace( (char*) word->getText() );
>
> seems like a no go to me.
Anyway i fixed a lot of style issues, some bugs and some memory leaks and
commited your code. Will be available in poppler >= 0.15.1
Albert
>
> Albert
>
> > -KB
> >
> > On Wed, 29 Sep 2010, Albert Astals Cid wrote:
> > > A Dimecres, 22 de setembre de 2010, vàreu escriure:
> > >> Very funny.
> > >>
> > >> The old diff, using std:: is at:
> > >>
> > >> http://lists.freedesktop.org/archives/poppler/attachments/20100530/898
> > >> 25 275 /attachment.txt
> > >>
> > >> You can commit either today's diff or the 2010-05-30 (std::) diff. I
> > >> think the std:: version is less likely to have pointer-related bugs.
> > >
> > > textOut = new TextOutputDev("/dev/null",
> > >
> > > This doesn't look portable. Can you please fix it?
> > >
> > > Thanks,
> > >
> > > Albert
> > >
> > >> -KB
> > >>
> > >> On Wed, 22 Sep 2010, Albert Astals Cid wrote:
> > >> > A Dimecres, 22 de setembre de 2010, Kenneth Berland va escriure:
> > >> >> I have rewritten the replace function with standard C.
> > >> >
> > >> > Now is when you hate me but since a few weeks we accept std:: code
> > >> > if it's *obvious* it adds value over existing code.
> > >> >
> > >> > So can you please send again your patch to the mailing list?
> > >> >
> > >> > Sorry, totally forgot to tell you.
> > >> >
> > >> > Albert
> > >> >
> > >> >> -KB
> > >> >>
> > >> >> On Sun, 11 Jul 2010, Albert Astals Cid wrote:
> > >> >>> A Dimarts, 6 de juliol de 2010, Kenneth Berland va escriure:
> > >> >>>> Can I use std::string within any GooString methods I write (e.g.
> > >> >>>> replace) or am I limited to the C Standard library (i.e.
> > >> >>>> string.h)?
> > >> >>>
> > >> >>> No std:: usage anywhere in poppler (except in the cpp frontend).
> > >> >>>
> > >> >>> Albert
> > >> >>>
> > >> >>> On Mon, 5 Jul 2010, Kenneth Berland wrote:
> > >> >>>> Can I use std::string within any GooString methods I write (e.g.
> > >> >>>> replace) or am I limited to the C Standard library (i.e.
> > >> >>>> string.h)?
> > >> >>>>
> > >> >>>> -KB
> > >> >>>>
> > >> >>>> On Tue, 8 Jun 2010, Albert Astals Cid wrote:
> > >> >>>>> A Dimarts, 8 de juny de 2010, vàreu escriure:
> > >> >>>>>> Does GooString have a replace() method? I could not find one.
> > >> >>>>>> Does this mean I should write one?
> > >> >>>>>
> > >> >>>>> Yes, you'll have to write one or get the char * from the
> > >> >>>>> GooString and use c-
> > >> >>>>> string ones.
> > >> >>>>>
> > >> >>>>> Albert
> > >> >>>>>
> > >> >>>>>> -KB
> > >> >>>>>>
> > >> >>>>>> On Sun, 30 May 2010, Albert Astals Cid wrote:
> > >> >>>>>>> A Diumenge, 30 de maig de 2010, Kenneth Berland va escriure:
> > >> >>>>>>>> 1) Since I sent my last diff, I've:
> > >> >>>>>>>> a) added some string processing to make sure no HTML
>
> reserved
>
> > >> >>>>>>>>>> characters are placed into the output. I process each
> > >> >>>>>>>>>> word.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> b) altered the html a bit so that XML parsers can deal
> > >> >>>>>>>>>> with
> > >> >>>>>>
> > >> >>>>>> it.
> > >> >>>>>>
> > >> >>>>>>>>>> I've put in a title tag or an empty title tag and added end
> > >> >>>>>>>>>> tags to
> > >> >>>>>>
> > >> >>>>>> the
> > >> >>>>>>
> > >> >>>>>>>> meta tags.
> > >> >>>>>>>>
> > >> >>>>>>>>>> 2) Addressing your concerns:
> > >> >>>>>>>> a) I've removed the initialization of stdout.
> > >> >>>>>>>>
> > >> >>>>>>>>>> b) I close f now and reopen it. This also removes the
> > >> >>>>>>
> > >> >>>>>> warning.
> > >> >>>>>>
> > >> >>>>>>>>>> c) If a user is running with the -bbox option, they
want
> > >> >
> > >> > word
> > >> >
> > >> >>>>>>>>>> bounding boxes. If there are no words, I think a line to
> > >> >>>>>>>>>> stderr is
> > >> >>>>>>>>
> > >> >>>>>>>> appropriate.
> > >> >>>>>>>>
> > >> >>>>>>>> Cool, though we try not to use the std (yeah it sucks i
> > >> >>>>>>>> know), can
> > >> >>>>>>
> > >> >>>>>> you
> > >> >>>>>>
> > >> >>>>>>> either use GooString or char *?
> > >> >>>>>>>
> > >> >>>>>>>>> Thanks,
> > >> >>>>>>>>>
> > >> >>>>>>>> Albert
> > >> >>>>>>>>
> > >> >>>>>>>>> -KB
> > >> >>>>>>>>>
> > >> >>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> > >> >>>>>>>>> A Dimecres, 26 de maig de 2010, Kenneth Berland va escriure:
> > >> >>>>>>>>>> I get a compiler warning without it.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>>>>> pdftotext.cc: In function ‘int main(int, char**)’:
> > >> >>>>>>>>>> pdftotext.cc:164: warning: ‘f’ may be used uninitialized in
> > >> >>>>>>>>>> this function
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>>> That change will not get accepted, sorry, initializing f
> > >> >>>>>>>>>>>> to
> > >> >>>>>>
> > >> >>>>>> stdout is
> > >> >>>>>>
> > >> >>>>>>>>> not a solution.
> > >> >>>>>>>>>
> > >> >>>>>>>>>>>> Also i do not like the fact that you do not close f if
> > >> >>>>>>>>>>>> you are
> > >> >>>>>>
> > >> >>>>>> writing
> > >> >>>>>>
> > >> >>>>>>>>> the bbox? Can't you just open it again like the code already
> > >> >>>>>>>>> does?
> > >> >>>>>>>>>
> > >> >>>>>>>>>>>> Also i do not understand why the code considers a page
> > >> >>>>>>>>>>>> having no
> > >> >>>>>>
> > >> >>>>>> text
> > >> >>>>>>
> > >> >>>>>>>>> an error.
> > >> >>>>>>>>>
> > >> >>>>>>>>>>>> Albert
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>> -KB
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> On Wed, 26 May 2010, Albert Astals Cid wrote:
> > >> >>>>>>>>>>> A Diumenge, 9 de maig de 2010, Kenneth Berland va escriure:
> > >> >>>>>>>>>>>> List,
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>>>>> I've attached a small addition to pdftotext that
> > >> >>>>>>>>>>>>>>>>>> outputs
> > >> >>>>>>
> > >> >>>>>> bounding
> > >> >>>>>>
> > >> >>>>>>>>>>>> box information to html like this:
> > >> >>>>>>>>>>>>>>>>>> <doc>
> > >> >>>>>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>>>>> <page width="612.000000" height="792.000000"/>
> > >> >>>>>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>>>>> <word xMin="56.800000" yMin="57.208000"
> > >> >>>>>>
> > >> >>>>>> xMax="75.412000"
> > >> >>>>>>
> > >> >>>>>>>>>>>>>>>>>> yMax="70.492000">The</word> </page>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> </doc>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>>>>> I had a need, maybe others will too.
> > >> >>>>>>>>>>>>>>>>>> -KB
> > >> >>>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>>> Why is this change necessary?
> > >> >>>>>>>>>>>>>>>> - FILE *f;
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> + FILE *f = stdout;
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>>>>>> Albert
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> _______________________________________________
> > >> >>>>>>>>>
> > >> >>>>>>>>> poppler mailing list
> > >> >>>>>>>>> poppler at lists.freedesktop.org
> > >> >>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> > >> >>>>>
> > >> >>>>> _______________________________________________
> > >> >>>>> poppler mailing list
> > >> >>>>> poppler at lists.freedesktop.org
> > >> >>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> > >
> > > _______________________________________________
> > > poppler mailing list
> > > poppler at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/poppler
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list