2008/5/6 Jamie McCracken <<a href="mailto:jamie.mccrack@googlemail.com">jamie.mccrack@googlemail.com</a>>:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div><div></div><div class="Wj3C7c"><br>
On Tue, 2008-05-06 at 16:57 +0200, Mikkel Kamstrup Erlandsen wrote:<br>
> 2008/5/2 Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>>:<br>
> I have a handful comments about this (Jos also asked about the<br>
> same on<br>
> IRC recently).<br>
> It was in fact a design decision, but i am writing this from<br>
> my mobile<br>
> since I'm on holiday, so I'll elaborate when I get home<br>
> tuesday.<br>
><br>
> Cheers,<br>
> Mikkel<br>
><br>
> As promised...<br>
><br>
> Let's first establish some terminology. A Paged Model is one where you<br>
> can request hits with an offset and a count. A Streaming Model is one<br>
> like we have now, where you specify how many hits to read on each<br>
> request and then read hits sequentially (like file reading without<br>
> seeking).<br>
><br>
> It should be noted that the Xesam Search spec is designed for desktop<br>
> search (and not generic search on a database or Google-style web<br>
> search with millions of hits). Furthermore it should be feasible to<br>
> implement in a host of different backends, not just full fledged<br>
> search engines.<br>
><br>
> There are basically three backends where a paged model can be<br>
> problematic. Web services, Aggregated searches, and Grep/Find-like<br>
> implementations.<br>
><br>
> * Web services. While Google's GData Query API does allow paging, not<br>
> all webservices does this. For example the OAI-PMH[1] standard does<br>
> not do paging, merely sequential reading. Ofcourse OAI-PMH is a<br>
> standard for harvesting metadata, but I could imagine a "search<br>
> engine" extracting metadata from the OAI-PMH result on the fly.<br>
><br>
> * Aggregated search. Consider a setup where the Xesam search engine<br>
> is proxying a collection of other search engines. It is a classical<br>
> problem to look up hits 1000-1010 in this setup. The search engine<br>
> will have to retrieve the first 1010 hits from all sub-search engines<br>
> to get it right. Maybe there is a clever algorithm to do this more<br>
> cleverly, but I have not heard of it. This is ofcourse also a problem<br>
> in a streaming model, but it will not trick developers into believing<br>
> that GetHits(s, 1000, 1010) is a cheap call.<br>
><br>
> * Grep-like backends or more generally backends where the search<br>
> results will roll in sequentially.<br>
><br>
> I think it is a bad time to break the API like this. It is in fact a<br>
> quite big break if you ask me, since our current approach has been<br>
> stream-based and what you propose is changing the paradigm to a page<br>
> based model. Also bad because it is the wrong signal to send with such<br>
> and important change in the last minute.<br>
><br>
> I see a few API-stable alternatives though.<br>
><br>
> 1) Add a SeekHit(in s search, in i hit_id, out i new_pos). This<br>
> basically adds a cursoring mechanism to the API<br>
> 2) In style of 1) but lighter - add SkipHits(in s search, in i count,<br>
> out i new_pos)<br>
><br>
> These options also stay within the standard streaming terminology. We<br>
> could make them optional by making them throw exceptions if the (new)<br>
> session property vendor.paging is True.<br>
><br>
> As Jos also points out later in the thread GetHitData is actually<br>
> paging and the workaround he describes can actually be made very<br>
> efficient since we already have the hit.fields.extended session prop<br>
> to hint what properties we will fetch.<br>
><br>
> Let me make it clear that I am not refusing the change to a paging<br>
> model if that is what the majority rules. We should just make an<br>
> informed decision that we are sure we agree on.<br>
><br>
<br>
<br>
</div></div>im proposing adding new api not breaking existing ones. The existing<br>
stuff can easily emulate paging if it lacks native support<br>
<br>
I would prefer new api that takes a start point param and a count/length<br>
param sow e have full random access<br>
<font color="#888888"></font></blockquote><div><br>And how is GetHitData not good enough for that?<br><br>Cheers,<br>Mikkel <br></div></div>