[poppler] poppler get bounding box of whole content
Albert Astals Cid
aacid at kde.org
Wed May 29 15:01:01 PDT 2013
El Dimecres, 29 de maig de 2013, a les 23:57:44, Mihai Niculescu va escriure:
> mail list included. Replay below.
>
> On 05/29/2013 11:39 PM, Albert Astals Cid wrote:
> > El Dimecres, 29 de maig de 2013, a les 21:54:43, Mihai Niculescu va
escriure:
> >> Hi,
> >>
> >> I am trying to get the bounding box of all content in a page(text,
> >> images, tables, etc) in poppler, but I can’t figure this out.
> >>
> >> For example, I want to dublicate the result given by ghostscript:
> >> gs -sDEVICE=bbox golfer.ps
> >>
> >> prints out
> >>
> >> %%BoundingBox: 0 25 583 732
> >>
> >> %%HiResBoundingBox: 0.808497 25.009496 582.994503
> >> 731.809445
> >>
> >> How can this be done with poppler?
> >
> > Is that the "real" bounding box or one of the pdf boxes (crop, bleed,
> > etc)?
> >
> > Cheers,
> >
> > Albert
> >>
> >> Thanks,
> >> Mihai
> >> _______________________________________________
> >> poppler mailing list
> >> poppler at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/poppler
> >
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler
>
> Not the pdf boxes. I mean the union of all bounding boxes of all
> elements (text, images, tables, glyphs, etc) in pdf. Let me explain more.
>
> I tried to loop over all QList<TextBox> but these do not include other
> glyphs in the pdf (greek sigma - for summation in latex) or maybe
> poppler does not see it as a text:
>
> QList<Poppler::TextBox*> wholetext = pdfPage->textList();
>
> //float minX, maxX, minY, maxY;
>
> QRectF unitedTextbbox, textbbox;
>
> for(int i=0; i<wholetext.size(); ++i){
>
> Poppler::TextBox* textBox = wholetext.at(i);
>
> textbbox = textBox->boundingBox();
>
> if(i==0){
>
> unitedTextbbox=textbbox;
>
> }else{
>
> unitedTextbbox=unitedTextbbox.united(textbbox);
>
> }
>
> }
>
>
> This works great when there is simple text in pdf, but when there are
> other symbols it does not. I need something like the example above but
> to include all elements in the pdf. I'll go and use only poppler
> (without qt4 wrapper) if I can have this.
Can't think on how to get what you want easily to be honest.
As a quick solution you can render the page at a relatively low res and work
the bbox from it, just check if the 4 corners are the same color and iterate
on that.
Cheers,
Albert
>
> Cheers,
> Mihai
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list