[poppler] Regarding page search

Albert Astals Cid aacid at kde.org
Mon Mar 22 13:30:36 PDT 2010


I've pushed an overload that accepts doubles instead of a QRectF and that 
should fix your problem.

Albert

A Dimarts, 16 de març de 2010, vàreu escriure:
> Hello Albert,
> 
> Thanks for your reply ,How are you ?
> 
> I have one question regarding search of the word , this algorithm is
> O(n^3+K) and if i want to search all hit of one page in that case its
> O(n^4+K)  which making the  application very slow . And also
> every-time its starting from the top and skiping upto the previous hit
> means lot of comparison is waste.
> 
> So If I will try to write one algorithm which will return all the
> searchHit of one page will it be acceptable ?  and also can we improve
> the performance of this search algorithm also ?
> 
> On Tue, Mar 16, 2010 at 5:01 AM, Albert Astals Cid <aacid at kde.org> wrote:
> > A Dilluns, 15 de març de 2010, amit aggarwal va escriure:
> >> Yes it a problem in floating point comparison in findText algorithm .
> >> As per my use cases its coming in
> >> 
> >>       // check: is the line above the top limit?
> >>       if (!startAtTop &&
> >>         (backward ? line->yMin > yStart : line->yMin < yStart)) {
> >>       continue;
> >>       }
> >> line->yMin <yStart but there might be a probability that problem will
> >> come  in other comparison also like
> >> 
> >>     // check: is the block above the top limit?
> >> 
> >>     if (!startAtTop && (backward ? blk->yMin > yStart : blk->yMax <
> >> yStart)) { continue;
> >>     }
> >> 
> >> blk->yMax < yStart  same for bottom limit check also.
> >> 
> >> Please  let me know your comment  for the same
> > 
> > Yeah, there is a ugly double<->float conversion hitting you there, Pino
> > and me are looking for the fix that makes more sense, give us some days.
> > 
> > Albert
> > 
> >> On Mon, Mar 15, 2010 at 3:52 PM, amit aggarwal <amitcs06 at gmail.com> 
wrote:
> >> > Hello Albert,
> >> > 
> >> > I have done analysis and found the problem is in floating point
> >> > comparison..
> >> > 
> >> > yStart :226.279999 line->yMin: 226.280000 this is log where yStart is
> >> > the previous searchHit result  and second new is giving 226.8 Effect
> >> > of this is making
> >> > 
> >> >      if (!startAtTop &&
> >> >          (backward ? line->yMin > yStart : line->yMin < yStart)) {
> >> >        continue;
> >> >      }
> >> > the above code or condition by-pass in findText TextOutputDev.cc so
> >> > that its returning the same co-ordinate every-time.
> >> > 
> >> > Looking forward for your comments
> >> > 
> >> > 
> >> > Pasting some more logs  may be it will clear the more about the
> >> > problem
> >> > 
> >> > Debug: startSearch**
> >> > Debug:
> >> > ***************************searchPageForward***********************
> >> > 
> >> > Debug: "PdfPageWidget:: Load time: 0.023000 s."
> >> > 
> >> >  coming here constructor TextPage to make haveLastFind zero
> >> > startAtLast:0 haveLastFind:0 *xMin: 0.000000,*yMin: 0.000000
> >> > blk->yMax: 82.285000 yStart :0.000000line->yMin: 69.492000blk->yMax:
> >> > 272.800000 yStart :0.000000line->yMin: 226.280000
> >> > Making found true xMin1:391.900000 yMin1:226.280000 blk->yMax:
> >> > 316.570000 yStart :0.000000line->yMin: 290.984000blk->yMax: 334.285000
> >> > yStart :0.000000line->yMin: 321.492000blk->yMax: 359.685000 yStart
> >> > 
> >> > :0.000000line->yMin: 346.892000blk->yMax: 377.685000 yStart
> >> > :0.000000line->yMin: 364.892000blk->yMax: 425.154000 yStart
> >> > :0.000000line->yMin: 409.530000blk->yMax: 542.485000 yStart
> >> > :0.000000line->yMin: 516.892000line->yMin: 529.692000
> >> > 
> >> > lastFindXMin: 391.900000 lastFindYMin: 226.280000 haveLastFind: 1
> >> > *xMin 391.900000 *yMin 226.280000
> >> > Debug: From Top
> >> > Debug: **********TopPage: 1 qrect: QRectF(391.9,226.28 85.88x46.52)
> >> > searchText: "Offic"
> >> > 
> >> >  coming to here constructor TextPage to make haveLastFind zero
> >> > 
> >> > startAtLast:1 haveLastFind:0 *xMin: 391.899994,*yMin: 226.279999
> >> > blk->yMax: 82.285000 yStart :226.279999blk->yMax: 272.800000 yStart
> >> > 
> >> > :226.279999line->yMin: 226.280000
> >> > 
> >> > Making found true xMin1:391.900000 yMin1:226.280000 blk->yMax:
> >> > 316.570000 yStart :226.279999line->yMin: 290.984000blk->yMax:
> >> > 334.285000 yStart :226.279999line->yMin: 321.492000blk->yMax:
> >> > 359.685000 yStart :226.279999line->yMin: 346.892000blk->yMax:
> >> > 377.685000 yStart :226.279999line->yMin: 364.892000blk->yMax:
> >> > 425.154000 yStart :226.279999line->yMin: 409.530000blk->yMax:
> >> > 542.485000 yStart :226.279999line->yMin: 516.892000line->yMin:
> >> > 529.692000
> >> > lastFindXMin: 391.900000 lastFindYMin: 226.280000 haveLastFind: 1
> >> > *xMin 391.900000 *yMin 226.280000
> >> > Debug: Poppler1 Next sLeft 391.9 sTop 226.28 sRight 477.78 sBottom
> >> > 272.8 Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> > 85.88x46.52) searchText: "Offic"
> >> > 
> >> > On Fri, Mar 12, 2010 at 2:21 PM, Albert Astals Cid <aacid at kde.org> 
wrote:
> >> >> A Divendres, 12 de març de 2010, amit aggarwal va escriure:
> >> >>> Hello Albert,
> >> >>> Thanks for your reply , Please do let me know if I can enable and
> >> >>> send you log etc. so that you will do analysis, In the meanwhile I
> >> >>> am also looking into the issue how to fix it ?
> >> >>> 
> >> >>> Any knowledge or help if you want to share plz , bcoz that will help
> >> >>> me to fix this issue quickly.
> >> >> 
> >> >> I don't have any suggestion besides "debug the same code in the two
> >> >> machines at the same time and see why it works in one and not in the
> >> >> other".
> >> >> 
> >> >> Albert
> >> >> 
> >> >>> On Fri, Mar 12, 2010 at 1:05 AM, Albert Astals Cid <aacid at kde.org>
> > 
> > wrote:
> >> >>> > A Dijous, 11 de març de 2010, amit aggarwal va escriure:
> >> >>> >> Hello All,
> >> >>> >> 
> >> >>> >> As per the analysis , search is always starting form the top and
> >> >>> >> every-time its getting first hit and returning the same
> >> >>> >> co-ordinate.
> >> >>> >> 
> >> >>> >> >>> >     while(mDocument->page(pageindex)->search(
> >> >>> >> >>> > 
> >> >>> >> >>> >                searchText,
> >> >>> >> >>> >                searchHit,
> >> >>> >> >>> >                Poppler::Page::NextResult,
> >> >>> >> >>> >                Poppler::Page::CaseInsensitive)) {
> >> >>> >> 
> >> >>> >> So my question is how can i make it so that it will move to next
> >> >>> >> hit or in other-way how can i make so that it will not always
> >> >>> >> start from top of that page ?
> >> >>> >> 
> >> >>> >> Looking forward for your help
> >> >>> > 
> >> >>> > If it works on x86 and does not work on ARM it probably means
> >> >>> > something is overflowing, can't help you since i don't have any
> >> >>> > ARM gadget to play with.
> >> >>> > 
> >> >>> > Sorry,
> >> >>> >  Albert
> >> >>> > 
> >> >>> >> On Thu, Mar 11, 2010 at 7:49 AM, amit aggarwal
> >> >>> >> <amitcs06 at gmail.com>
> >> >> 
> >> >> wrote:
> >> >>> >> > Hello Albert,
> >> >>> >> > 
> >> >>> >> > Thanks for your reply , Yes I saw the same observation its
> >> >>> >> > workiing fine on normal PC but when I using the same in ARM
> >> >>> >> > based m/c its returning me same co-ordinates eveytiime .Its
> >> >>> >> > not moving to next hit even though that page have next hit.
> >> >>> >> > 
> >> >>> >> > On Thu, Mar 11, 2010 at 4:15 AM, Albert Astals Cid
> >> >>> >> > <aacid at kde.org>
> >> >> 
> >> >> wrote:
> >> >>> >> >> A Dimecres, 10 de març de 2010, amit aggarwal va escriure:
> >> >>> >> >>> Hello,
> >> >>> >> >>> 
> >> >>> >> >>> One observation which I noticed is that same page all
> >> >>> >> >>> searchText below algorithm is working fine on x86 but its
> >> >>> >> >>> returning same co-ordinate on arm processor.
> >> >>> >> >>> 
> >> >>> >> >>> Please help me is there any problem in page search API or I
> >> >>> >> >>> am not using it in correct way ?
> >> >>> >> >> 
> >> >>> >> >> You mean the code works in a regular PC but fails in an arm
> >> >>> >> >> based machine?
> >> >>> >> >> 
> >> >>> >> >> Albert
> >> >>> >> >> 
> >> >>> >> >>> On Wed, Mar 10, 2010 at 6:03 PM, amit aggarwal
> >> >>> >> >>> <amitcs06 at gmail.com>
> >> >>> > 
> >> >>> > wrote:
> >> >>> >> >>> > Hi All,
> >> >>> >> >>> > 
> >> >>> >> >>> > I am using page search API in different thread to search
> >> >>> >> >>> > the word of that particular page. But I am getting one
> >> >>> >> >>> > problem that its returning same co-ordinate every time
> >> >>> >> >>> > even though that page is containing more than 2 search hit
> >> >>> >> >>> > also. So that while loop is never ending. Please help me
> >> >>> >> >>> > for the same and let me know If I am doing something
> >> >>> >> >>> > wrong.
> >> >>> >> >>> > 
> >> >>> >> >>> > I am attaching my code snippets and log also.
> >> >>> >> >>> > 
> >> >>> >> >>> >     QRectF searchHit;
> >> >>> >> >>> >     while(mDocument->page(pageindex)->search(
> >> >>> >> >>> >                searchText,
> >> >>> >> >>> >                searchHit,
> >> >>> >> >>> >                Poppler::Page::NextResult,
> >> >>> >> >>> >                Poppler::Page::CaseInsensitive)) {
> >> >>> >> >>> > 
> >> >>> >> >>> > 
> >> >>> >> >>> > qDebug()<<"**********Page:"<<pageIndex+1<<"qrect:"<<searchH
> >> >>> >> >>> > it< <"se arc hTex t:"<<searchText;
> >> >>> >> >>> > (mResults[page]).append(searchHit); }
> >> >>> >> >>> > 
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > Debug: **********Page: 1 qrect: QRectF(391.9,226.28
> >> >>> >> >>> > 110.48x46.52) searchText: "Office"
> >> >>> >> >>> > 
> >> >>> >> >>> > 
> >> >>> >> >>> > 
> >> >>> >> >>> > --
> >> >>> >> >>> > Thanks
> >> >>> >> >>> > Amit Aggarwal
> >> >>> >> >> 
> >> >>> >> >> _______________________________________________
> >> >>> >> >> poppler mailing list
> >> >>> >> >> poppler at lists.freedesktop.org
> >> >>> >> >> http://lists.freedesktop.org/mailman/listinfo/poppler
> >> >>> >> > 
> >> >>> >> > --
> >> >>> >> > Thanks
> >> >>> >> > Amit Aggarwal
> >> >>> > 
> >> >>> > _______________________________________________
> >> >>> > poppler mailing list
> >> >>> > poppler at lists.freedesktop.org
> >> >>> > http://lists.freedesktop.org/mailman/listinfo/poppler
> >> >> 
> >> >> _______________________________________________
> >> >> poppler mailing list
> >> >> poppler at lists.freedesktop.org
> >> >> http://lists.freedesktop.org/mailman/listinfo/poppler
> >> > 
> >> > --
> >> > Thanks
> >> > Amit Aggarwal
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list