[poppler] Regarding page search

amit aggarwal amitcs06 at gmail.com
Mon Mar 15 20:17:27 PDT 2010


Hello Albert,

Thanks for your reply ,How are you ?

I have one question regarding search of the word , this algorithm is
O(n^3+K) and if i want to search all hit of one page in that case its
O(n^4+K)  which making the  application very slow . And also
every-time its starting from the top and skiping upto the previous hit
means lot of comparison is waste.

So If I will try to write one algorithm which will return all the
searchHit of one page will it be acceptable ?  and also can we improve
the performance of this search algorithm also ?



On Tue, Mar 16, 2010 at 5:01 AM, Albert Astals Cid <aacid at kde.org> wrote:
> A Dilluns, 15 de març de 2010, amit aggarwal va escriure:
>> Yes it a problem in floating point comparison in findText algorithm .
>> As per my use cases its coming in
>>
>>       // check: is the line above the top limit?
>>       if (!startAtTop &&
>>         (backward ? line->yMin > yStart : line->yMin < yStart)) {
>>       continue;
>>       }
>> line->yMin <yStart but there might be a probability that problem will
>> come  in other comparison also like
>>
>>     // check: is the block above the top limit?
>>
>>     if (!startAtTop && (backward ? blk->yMin > yStart : blk->yMax <
>> yStart)) { continue;
>>     }
>>
>> blk->yMax < yStart  same for bottom limit check also.
>>
>> Please  let me know your comment  for the same
>
> Yeah, there is a ugly double<->float conversion hitting you there, Pino and me
> are looking for the fix that makes more sense, give us some days.
>
> Albert
>
>>
>> On Mon, Mar 15, 2010 at 3:52 PM, amit aggarwal <amitcs06 at gmail.com> wrote:
>> > Hello Albert,
>> >
>> > I have done analysis and found the problem is in floating point
>> > comparison..
>> >
>> > yStart :226.279999 line->yMin: 226.280000 this is log where yStart is
>> > the previous searchHit result  and second new is giving 226.8 Effect
>> > of this is making
>> >
>> >      if (!startAtTop &&
>> >          (backward ? line->yMin > yStart : line->yMin < yStart)) {
>> >        continue;
>> >      }
>> > the above code or condition by-pass in findText TextOutputDev.cc so
>> > that its returning the same co-ordinate every-time.
>> >
>> > Looking forward for your comments
>> >
>> >
>> > Pasting some more logs  may be it will clear the more about the problem
>> >
>> > Debug: startSearch**
>> > Debug:
>> > ***************************searchPageForward***********************
>> >
>> > Debug: "PdfPageWidget:: Load time: 0.023000 s."
>> >
>> >  coming here constructor TextPage to make haveLastFind zero
>> > startAtLast:0 haveLastFind:0 *xMin: 0.000000,*yMin: 0.000000
>> > blk->yMax: 82.285000 yStart :0.000000line->yMin: 69.492000blk->yMax:
>> > 272.800000 yStart :0.000000line->yMin: 226.280000
>> > Making found true xMin1:391.900000 yMin1:226.280000 blk->yMax:
>> > 316.570000 yStart :0.000000line->yMin: 290.984000blk->yMax: 334.285000
>> > yStart :0.000000line->yMin: 321.492000blk->yMax: 359.685000 yStart
>> >
>> > :0.000000line->yMin: 346.892000blk->yMax: 377.685000 yStart
>> > :0.000000line->yMin: 364.892000blk->yMax: 425.154000 yStart
>> > :0.000000line->yMin: 409.530000blk->yMax: 542.485000 yStart
>> > :0.000000line->yMin: 516.892000line->yMin: 529.692000
>> >
>> > lastFindXMin: 391.900000 lastFindYMin: 226.280000 haveLastFind: 1
>> > *xMin 391.900000 *yMin 226.280000
>> > Debug: From Top
>> > Debug: **********TopPage: 1 qrect: QRectF(391.9,226.28 85.88x46.52)
>> > searchText: "Offic"
>> >
>> >  coming to here constructor TextPage to make haveLastFind zero
>> >
>> > startAtLast:1 haveLastFind:0 *xMin: 391.899994,*yMin: 226.279999
>> > blk->yMax: 82.285000 yStart :226.279999blk->yMax: 272.800000 yStart
>> >
>> > :226.279999line->yMin: 226.280000
>> >
>> > Making found true xMin1:391.900000 yMin1:226.280000 blk->yMax:
>> > 316.570000 yStart :226.279999line->yMin: 290.984000blk->yMax:
>> > 334.285000 yStart :226.279999line->yMin: 321.492000blk->yMax:
>> > 359.685000 yStart :226.279999line->yMin: 346.892000blk->yMax:
>> > 377.685000 yStart :226.279999line->yMin: 364.892000blk->yMax:
>> > 425.154000 yStart :226.279999line->yMin: 409.530000blk->yMax:
>> > 542.485000 yStart :226.279999line->yMin: 516.892000line->yMin:
>> > 529.692000
>> > lastFindXMin: 391.900000 lastFindYMin: 226.280000 haveLastFind: 1
>> > *xMin 391.900000 *yMin 226.280000
>> > Debug: Poppler1 Next sLeft 391.9 sTop 226.28 sRight 477.78 sBottom 272.8
>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28 85.88x46.52)
>> > searchText: "Offic"
>> >
>> > On Fri, Mar 12, 2010 at 2:21 PM, Albert Astals Cid <aacid at kde.org> wrote:
>> >> A Divendres, 12 de març de 2010, amit aggarwal va escriure:
>> >>> Hello Albert,
>> >>> Thanks for your reply , Please do let me know if I can enable and send
>> >>> you log etc. so that you will do analysis, In the meanwhile I am also
>> >>> looking into the issue how to fix it ?
>> >>>
>> >>> Any knowledge or help if you want to share plz , bcoz that will help
>> >>> me to fix this issue quickly.
>> >>
>> >> I don't have any suggestion besides "debug the same code in the two
>> >> machines at the same time and see why it works in one and not in the
>> >> other".
>> >>
>> >> Albert
>> >>
>> >>> On Fri, Mar 12, 2010 at 1:05 AM, Albert Astals Cid <aacid at kde.org>
> wrote:
>> >>> > A Dijous, 11 de març de 2010, amit aggarwal va escriure:
>> >>> >> Hello All,
>> >>> >>
>> >>> >> As per the analysis , search is always starting form the top and
>> >>> >> every-time its getting first hit and returning the same co-ordinate.
>> >>> >>
>> >>> >> >>> >     while(mDocument->page(pageindex)->search(
>> >>> >> >>> >
>> >>> >> >>> >                searchText,
>> >>> >> >>> >                searchHit,
>> >>> >> >>> >                Poppler::Page::NextResult,
>> >>> >> >>> >                Poppler::Page::CaseInsensitive)) {
>> >>> >>
>> >>> >> So my question is how can i make it so that it will move to next hit
>> >>> >> or in other-way how can i make so that it will not always start from
>> >>> >> top of that page ?
>> >>> >>
>> >>> >> Looking forward for your help
>> >>> >
>> >>> > If it works on x86 and does not work on ARM it probably means
>> >>> > something is overflowing, can't help you since i don't have any ARM
>> >>> > gadget to play with.
>> >>> >
>> >>> > Sorry,
>> >>> >  Albert
>> >>> >
>> >>> >> On Thu, Mar 11, 2010 at 7:49 AM, amit aggarwal <amitcs06 at gmail.com>
>> >>
>> >> wrote:
>> >>> >> > Hello Albert,
>> >>> >> >
>> >>> >> > Thanks for your reply , Yes I saw the same observation its
>> >>> >> > workiing fine on normal PC but when I using the same in ARM based
>> >>> >> > m/c its returning me same co-ordinates eveytiime .Its not moving
>> >>> >> > to next hit even though that page have next hit.
>> >>> >> >
>> >>> >> > On Thu, Mar 11, 2010 at 4:15 AM, Albert Astals Cid <aacid at kde.org>
>> >>
>> >> wrote:
>> >>> >> >> A Dimecres, 10 de març de 2010, amit aggarwal va escriure:
>> >>> >> >>> Hello,
>> >>> >> >>>
>> >>> >> >>> One observation which I noticed is that same page all searchText
>> >>> >> >>> below algorithm is working fine on x86 but its returning same
>> >>> >> >>> co-ordinate on arm processor.
>> >>> >> >>>
>> >>> >> >>> Please help me is there any problem in page search API or I am
>> >>> >> >>> not using it in correct way ?
>> >>> >> >>
>> >>> >> >> You mean the code works in a regular PC but fails in an arm based
>> >>> >> >> machine?
>> >>> >> >>
>> >>> >> >> Albert
>> >>> >> >>
>> >>> >> >>> On Wed, Mar 10, 2010 at 6:03 PM, amit aggarwal
>> >>> >> >>> <amitcs06 at gmail.com>
>> >>> >
>> >>> > wrote:
>> >>> >> >>> > Hi All,
>> >>> >> >>> >
>> >>> >> >>> > I am using page search API in different thread to search the
>> >>> >> >>> > word of that particular page. But I am getting one problem
>> >>> >> >>> > that its returning same co-ordinate every time even though
>> >>> >> >>> > that page is containing more than 2 search hit also.
>> >>> >> >>> > So that while loop is never ending. Please help me for the
>> >>> >> >>> > same and let me know If I am doing something wrong.
>> >>> >> >>> >
>> >>> >> >>> > I am attaching my code snippets and log also.
>> >>> >> >>> >
>> >>> >> >>> >     QRectF searchHit;
>> >>> >> >>> >     while(mDocument->page(pageindex)->search(
>> >>> >> >>> >                searchText,
>> >>> >> >>> >                searchHit,
>> >>> >> >>> >                Poppler::Page::NextResult,
>> >>> >> >>> >                Poppler::Page::CaseInsensitive)) {
>> >>> >> >>> >
>> >>> >> >>> >
>> >>> >> >>> > qDebug()<<"**********Page:"<<pageIndex+1<<"qrect:"<<searchHit<
>> >>> >> >>> > <"se arc hTex t:"<<searchText;
>> >>> >> >>> > (mResults[page]).append(searchHit); }
>> >>> >> >>> >
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
>> >>> >> >>> > 110.48x46.52) searchText: "Office"
>> >>> >> >>> >
>> >>> >> >>> >
>> >>> >> >>> >
>> >>> >> >>> > --
>> >>> >> >>> > Thanks
>> >>> >> >>> > Amit Aggarwal
>> >>> >> >>
>> >>> >> >> _______________________________________________
>> >>> >> >> poppler mailing list
>> >>> >> >> poppler at lists.freedesktop.org
>> >>> >> >> http://lists.freedesktop.org/mailman/listinfo/poppler
>> >>> >> >
>> >>> >> > --
>> >>> >> > Thanks
>> >>> >> > Amit Aggarwal
>> >>> >
>> >>> > _______________________________________________
>> >>> > poppler mailing list
>> >>> > poppler at lists.freedesktop.org
>> >>> > http://lists.freedesktop.org/mailman/listinfo/poppler
>> >>
>> >> _______________________________________________
>> >> poppler mailing list
>> >> poppler at lists.freedesktop.org
>> >> http://lists.freedesktop.org/mailman/listinfo/poppler
>> >
>> > --
>> > Thanks
>> > Amit Aggarwal
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>



-- 
Thanks
Amit Aggarwal


More information about the poppler mailing list