[poppler] poppler get bounding box of whole content
Mihai Niculescu
q.quark at gmail.com
Wed May 29 15:28:30 PDT 2013
On 05/30/2013 12:24 AM, Albert Astals Cid wrote:
> El Dijous, 30 de maig de 2013, a les 00:12:12, Mihai Niculescu va escriure:
>> On 05/30/2013 12:01 AM, Albert Astals Cid wrote:
>>> El Dimecres, 29 de maig de 2013, a les 23:57:44, Mihai Niculescu va
> escriure:
>>>> mail list included. Replay below.
>>>>
>>>> On 05/29/2013 11:39 PM, Albert Astals Cid wrote:
>>>>> El Dimecres, 29 de maig de 2013, a les 21:54:43, Mihai Niculescu va
>>> escriure:
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to get the bounding box of all content in a page(text,
>>>>>> images, tables, etc) in poppler, but I can’t figure this out.
>>>>>>
>>>>>> For example, I want to dublicate the result given by ghostscript:
>>>>>> gs -sDEVICE=bbox golfer.ps
>>>>>>
>>>>>> prints out
>>>>>>
>>>>>> %%BoundingBox: 0 25 583 732
>>>>>>
>>>>>> %%HiResBoundingBox: 0.808497 25.009496 582.994503
>>>>>> 731.809445
>>>>>>
>>>>>> How can this be done with poppler?
>>>>> Is that the "real" bounding box or one of the pdf boxes (crop, bleed,
>>>>> etc)?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Albert
>>>>>> Thanks,
>>>>>> Mihai
>>>>>> _______________________________________________
>>>>>> poppler mailing list
>>>>>> poppler at lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>> _______________________________________________
>>>>> poppler mailing list
>>>>> poppler at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>> Not the pdf boxes. I mean the union of all bounding boxes of all
>>>> elements (text, images, tables, glyphs, etc) in pdf. Let me explain more.
>>>>
>>>> I tried to loop over all QList<TextBox> but these do not include other
>>>> glyphs in the pdf (greek sigma - for summation in latex) or maybe
>>>>
>>>> poppler does not see it as a text:
>>>> QList<Poppler::TextBox*> wholetext = pdfPage->textList();
>>>>
>>>> //float minX, maxX, minY, maxY;
>>>>
>>>> QRectF unitedTextbbox, textbbox;
>>>>
>>>> for(int i=0; i<wholetext.size(); ++i){
>>>>
>>>> Poppler::TextBox* textBox = wholetext.at(i);
>>>>
>>>> textbbox = textBox->boundingBox();
>>>>
>>>> if(i==0){
>>>>
>>>> unitedTextbbox=textbbox;
>>>>
>>>> }else{
>>>>
>>>> unitedTextbbox=unitedTextbbox.united(textbbox);
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>> This works great when there is simple text in pdf, but when there are
>>>> other symbols it does not. I need something like the example above but
>>>> to include all elements in the pdf. I'll go and use only poppler
>>>> (without qt4 wrapper) if I can have this.
>>> Can't think on how to get what you want easily to be honest.
>>>
>>> As a quick solution you can render the page at a relatively low res and
>>> work the bbox from it, just check if the 4 corners are the same color and
>>> iterate on that.
>>>
>>> Cheers,
>>>
>>> Albert
>> That is a way I don't like it and hope not to do it. Can you give me
>> some directions on how should I approach this problem?
> Implement an outputdev and keep track of the bounding boxes there is the only
> way i can think of.
>
> Cheers,
> Albert
Thanks, I'll look into this! This seems to be just as in gs or muPDF:
creation an output device to compute the bounding boxes. Too bad poppler
doesn't have already one implemented.
Cheers,
Mihai
>
>>>> Cheers,
>>>> Mihai
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list