[poppler] poppler get bounding box of whole content

Mihai Niculescu q.quark at gmail.com
Wed May 29 15:12:12 PDT 2013


On 05/30/2013 12:01 AM, Albert Astals Cid wrote:
> El Dimecres, 29 de maig de 2013, a les 23:57:44, Mihai Niculescu va escriure:
>> mail list included. Replay below.
>>
>> On 05/29/2013 11:39 PM, Albert Astals Cid wrote:
>>> El Dimecres, 29 de maig de 2013, a les 21:54:43, Mihai Niculescu va
> escriure:
>>>> Hi,
>>>>
>>>> I am trying to get the bounding box of all content in a page(text,
>>>> images, tables, etc) in poppler, but I can’t figure this out.
>>>>
>>>> For example, I want to dublicate the result given by ghostscript:
>>>>                gs -sDEVICE=bbox golfer.ps
>>>>     
>>>>     prints out
>>>>     
>>>>                %%BoundingBox: 0 25 583 732
>>>>                
>>>>                %%HiResBoundingBox: 0.808497 25.009496 582.994503
>>>>                731.809445
>>>>
>>>> How can this be done with poppler?
>>> Is that the "real" bounding box or one of the pdf boxes (crop, bleed,
>>> etc)?
>>>
>>> Cheers,
>>>
>>>     Albert
>>>> Thanks,
>>>> Mihai
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>> Not the pdf boxes. I mean the union of all bounding boxes of all
>> elements (text, images, tables, glyphs, etc) in pdf. Let me explain more.
>>
>> I tried to loop over all QList<TextBox> but these do not include other
>> glyphs in the pdf (greek sigma - for summation in latex) or maybe
>> poppler does not see it as a text:
>>
>>     QList<Poppler::TextBox*> wholetext = pdfPage->textList();
>>
>>       //float minX, maxX, minY, maxY;
>>
>>       QRectF unitedTextbbox, textbbox;
>>
>>       for(int i=0; i<wholetext.size(); ++i){
>>
>>           Poppler::TextBox* textBox = wholetext.at(i);
>>
>>           textbbox = textBox->boundingBox();
>>
>>           if(i==0){
>>
>>               unitedTextbbox=textbbox;
>>
>>           }else{
>>
>>               unitedTextbbox=unitedTextbbox.united(textbbox);
>>
>>           }
>>
>>       }
>>
>>
>> This works great when there is simple text in pdf, but when there are
>> other symbols it does not.  I need something like the example above but
>> to include all elements in the pdf. I'll go and use only poppler
>> (without qt4 wrapper) if I can have this.
> Can't think on how to get what you want easily to be honest.
>
> As a quick solution you can render the page at a relatively low res and work
> the bbox from it, just check if the 4 corners are the same color and iterate
> on that.
>
> Cheers,
>    Albert
That is a way I don't like it and hope not to do it. Can you give me 
some directions on how should I approach this problem?

>
>> Cheers,
>> Mihai
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list