2008/5/2 Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com" target="_blank">mikkel.kamstrup@gmail.com</a>>:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I have a handful comments about this (Jos also asked about the same on<br>
IRC recently).<br>
It was in fact a design decision, but i am writing this from my mobile<br>
since I'm on holiday, so I'll elaborate when I get home tuesday.<br>
<br>
Cheers,<br>
Mikkel<br>
</blockquote><div><br>As promised...<br><br>Let's first establish some terminology. A Paged Model is one where you can request hits with an offset and a count. A Streaming Model is one like we have now, where you specify how many hits to read on each request and then read hits sequentially (like file reading without seeking).<br>
<br>It should be noted that the Xesam Search spec is designed for desktop search (and not generic search on a database or Google-style web search with millions of hits). Furthermore it should be feasible to implement in a host of different backends, not just full fledged search engines. <br>
<br>There are basically three backends where a paged model can be problematic. Web services, Aggregated searches, and Grep/Find-like implementations.<br><br> * Web services. While Google's GData Query API does allow paging, not all webservices does this. For example the OAI-PMH[1] standard does not do paging, merely sequential reading. Ofcourse OAI-PMH is a standard for harvesting metadata, but I could imagine a "search engine" extracting metadata from the OAI-PMH result on the fly.<br>
<br> * Aggregated search. Consider a setup where the Xesam search engine is proxying a collection of other search engines. It is a classical problem to look up hits 1000-1010 in this setup. The search engine will have to retrieve the first 1010 hits from all sub-search engines to get it right. Maybe there is a clever algorithm to do this more cleverly, but I have not heard of it. This is ofcourse also a problem in a streaming model, but it will not trick developers into believing that GetHits(s, 1000, 1010) is a cheap call.<br>
<br> * Grep-like backends or more generally backends where the search results will roll in sequentially.<br><br>I think it is a bad time to break the API like this. It is in fact a quite big break if you ask me, since our current approach has been stream-based and what you propose is changing the paradigm to a page based model. Also bad because it is the wrong signal to send with such and important change in the last minute.<br>
<br>I see a few API-stable alternatives though.<br><br>1) Add a SeekHit(in s search, in i hit_id, out i new_pos). This basically adds a cursoring mechanism to the API<br>2) In style of 1) but lighter - add SkipHits(in s search, in i count, out i new_pos)<br>
<br>These options also stay within the standard streaming terminology. We could make them optional by making them throw exceptions if the (new) session property vendor.paging is True.<br><br>As Jos also points out later in the thread GetHitData is actually paging and the workaround he describes can actually be made very efficient since we already have the hit.fields.extended session prop to hint what properties we will fetch.<br>
<br>Let me make it clear that I am not refusing the change to a paging model if that is what the majority rules. We should just make an informed decision that we are sure we agree on.<br><br>Cheers,<br>Mikkel<br><br>[1]: <a href="http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm" target="_blank">http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm</a><br>
<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<br>
2008/5/2, Jamie McCracken <<a href="mailto:jamie.mccrack@googlemail.com" target="_blank">jamie.mccrack@googlemail.com</a>>:<br>
<div><div></div><div>> For a serch gui its essential to page results using an offset and limit<br>
> to define the page size<br>
><br>
> currently xesam api lacks the offset component (although it has a limit)<br>
><br>
> there are several workarounds:<br>
><br>
> 1) add a hit.offset property<br>
> 2) add new api : GetPagedHits (in string search, in int PageStart, in<br>
> int PageEnd, out aav results) or similar<br>
> 3) add a hit.pagesize property and have GetNextpage/getPrevPage methods<br>
><br>
> Anyway we desperately need this to make things fast otherwise putting a<br>
> huge result set over dbus is gonna be awfully slow<br>
><br>
> jamie<br>
><br>
><br>
><br>
> _______________________________________________<br>
> Xesam mailing list<br>
> <a href="mailto:Xesam@lists.freedesktop.org" target="_blank">Xesam@lists.freedesktop.org</a><br>
> <a href="http://lists.freedesktop.org/mailman/listinfo/xesam" target="_blank">http://lists.freedesktop.org/mailman/listinfo/xesam</a><br>
><br>
</div></div></blockquote></div><br>