[poppler] poppler get bounding box of whole content

Albert Astals Cid aacid at kde.org
Wed May 29 15:24:02 PDT 2013


El Dijous, 30 de maig de 2013, a les 00:12:12, Mihai Niculescu va escriure:
> On 05/30/2013 12:01 AM, Albert Astals Cid wrote:
> > El Dimecres, 29 de maig de 2013, a les 23:57:44, Mihai Niculescu va 
escriure:
> >> mail list included. Replay below.
> >> 
> >> On 05/29/2013 11:39 PM, Albert Astals Cid wrote:
> >>> El Dimecres, 29 de maig de 2013, a les 21:54:43, Mihai Niculescu va
> > 
> > escriure:
> >>>> Hi,
> >>>> 
> >>>> I am trying to get the bounding box of all content in a page(text,
> >>>> images, tables, etc) in poppler, but I can’t figure this out.
> >>>> 
> >>>> For example, I want to dublicate the result given by ghostscript:
> >>>>                gs -sDEVICE=bbox golfer.ps
> >>>>     
> >>>>     prints out
> >>>>     
> >>>>                %%BoundingBox: 0 25 583 732
> >>>>                
> >>>>                %%HiResBoundingBox: 0.808497 25.009496 582.994503
> >>>>                731.809445
> >>>> 
> >>>> How can this be done with poppler?
> >>> 
> >>> Is that the "real" bounding box or one of the pdf boxes (crop, bleed,
> >>> etc)?
> >>> 
> >>> Cheers,
> >>> 
> >>>     Albert
> >>>> 
> >>>> Thanks,
> >>>> Mihai
> >>>> _______________________________________________
> >>>> poppler mailing list
> >>>> poppler at lists.freedesktop.org
> >>>> http://lists.freedesktop.org/mailman/listinfo/poppler
> >>> 
> >>> _______________________________________________
> >>> poppler mailing list
> >>> poppler at lists.freedesktop.org
> >>> http://lists.freedesktop.org/mailman/listinfo/poppler
> >> 
> >> Not the pdf boxes. I mean the union of all bounding boxes of all
> >> elements (text, images, tables, glyphs, etc) in pdf. Let me explain more.
> >> 
> >> I tried to loop over all QList<TextBox> but these do not include other
> >> glyphs in the pdf (greek sigma - for summation in latex) or maybe
> >> 
> >> poppler does not see it as a text:
> >>     QList<Poppler::TextBox*> wholetext = pdfPage->textList();
> >>     
> >>       //float minX, maxX, minY, maxY;
> >>       
> >>       QRectF unitedTextbbox, textbbox;
> >>       
> >>       for(int i=0; i<wholetext.size(); ++i){
> >>       
> >>           Poppler::TextBox* textBox = wholetext.at(i);
> >>           
> >>           textbbox = textBox->boundingBox();
> >>           
> >>           if(i==0){
> >>           
> >>               unitedTextbbox=textbbox;
> >>           
> >>           }else{
> >>           
> >>               unitedTextbbox=unitedTextbbox.united(textbbox);
> >>           
> >>           }
> >>       
> >>       }
> >> 
> >> This works great when there is simple text in pdf, but when there are
> >> other symbols it does not.  I need something like the example above but
> >> to include all elements in the pdf. I'll go and use only poppler
> >> (without qt4 wrapper) if I can have this.
> > 
> > Can't think on how to get what you want easily to be honest.
> > 
> > As a quick solution you can render the page at a relatively low res and
> > work the bbox from it, just check if the 4 corners are the same color and
> > iterate on that.
> > 
> > Cheers,
> > 
> >    Albert
> 
> That is a way I don't like it and hope not to do it. Can you give me
> some directions on how should I approach this problem?

Implement an outputdev and keep track of the bounding boxes there is the only 
way i can think of.

Cheers,
  Albert

> 
> >> Cheers,
> >> Mihai
> >> _______________________________________________
> >> poppler mailing list
> >> poppler at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list