2008/5/6 Jamie McCracken <<a href="mailto:jamie.mccrack@googlemail.com">jamie.mccrack@googlemail.com</a>>: <div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> you mean pull results over dbus and then page at client? </blockquote><div> No. The signature of GetHitData is (in s search_handle, in au hit_ids, in as fields, out aav hits) Ie you request which hits ids to fetch. To fetch a page pass [n, n+1, ..., n+page_size] as hit_ids.  </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> thats inefficient - pulling 10,000 hits over dbus is insanely slow (even just the URI) </blockquote><div> Hmmm, how slow is "insanely slow"? I doubt that this is true (by my standards of insanely slow).  </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Paging is a must have im my book otherwise tracker api will have to be used a lot instead of xesam whenever paged results are desired (more likely we will add Paged search to xesam on top of the standard) </blockquote><div> With a seekable API paging is easy to implement on the client.   Cheers, Mikkel </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div><div class="Wj3C7c"> On Tue, 2008-05-06 at 17:12 +0200, Mikkel Kamstrup Erlandsen wrote: > 2008/5/6 Jamie McCracken <<a href="mailto:jamie.mccrack@googlemail.com">jamie.mccrack@googlemail.com</a>>: > > >         On Tue, 2008-05-06 at 16:57 +0200, Mikkel Kamstrup Erlandsen >         wrote: >         > 2008/5/2 Mikkel Kamstrup Erlandsen >         <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>>: >         >         I have a handful comments about this (Jos also asked >         about the >         >         same on >         >         IRC recently). >         >         It was in fact a design decision, but i am writing >         this from >         >         my mobile >         >         since I'm on holiday, so I'll elaborate when I get >         home >         >         tuesday. >         > >         >         Cheers, >         >         Mikkel >         > >         > As promised... >         > >         > Let's first establish some terminology. A Paged Model is one >         where you >         > can request hits with an offset and a count. A Streaming >         Model is one >         > like we have now, where you specify how many hits to read on >         each >         > request and then read hits sequentially (like file reading >         without >         > seeking). >         > >         > It should be noted that the Xesam Search spec is designed >         for desktop >         > search (and not generic search on a database or Google-style >         web >         > search with millions of hits). Furthermore it should be >         feasible to >         > implement in a host of different backends, not just full >         fledged >         > search engines. >         > >         > There are basically three backends where a paged model can >         be >         > problematic. Web services, Aggregated searches, and >         Grep/Find-like >         > implementations. >         > >         >  * Web services. While Google's GData Query API does allow >         paging, not >         > all webservices does this. For example the OAI-PMH[1] >         standard does >         > not do paging, merely sequential reading. Ofcourse OAI-PMH >         is a >         > standard for harvesting metadata, but I could imagine a >         "search >         > engine" extracting metadata from the OAI-PMH result on the >         fly. >         > >         >  * Aggregated search. Consider a setup where the Xesam >         search engine >         > is proxying a collection of other search engines. It is a >         classical >         > problem to look up hits 1000-1010 in this setup. The search >         engine >         > will have to retrieve the first 1010 hits from all >         sub-search engines >         > to get it right. Maybe there is a clever algorithm to do >         this  more >         > cleverly, but I have not heard of it. This is ofcourse also >         a problem >         > in a streaming model, but it will not trick developers into >         believing >         > that GetHits(s, 1000, 1010) is a cheap call. >         > >         >  * Grep-like backends or more generally backends where the >         search >         > results will roll in sequentially. >         > >         > I think it is a bad time to break the API like this. It is >         in fact a >         > quite big break if you ask me, since our current approach >         has been >         > stream-based and what you propose is changing the paradigm >         to a page >         > based model. Also bad because it is the wrong signal to send >         with such >         > and important change in the last minute. >         > >         > I see a few API-stable alternatives though. >         > >         > 1) Add a SeekHit(in s search, in i hit_id, out i new_pos). >         This >         > basically adds a cursoring mechanism to the API >         > 2) In style of 1) but lighter - add SkipHits(in s search, in >         i count, >         > out i new_pos) >         > >         > These options also stay within the standard streaming >         terminology. We >         > could make them optional by making them throw exceptions if >         the (new) >         > session property vendor.paging is True. >         > >         > As Jos also points out later in the thread GetHitData is >         actually >         > paging and the workaround he describes can actually be made >         very >         > efficient since we already have the hit.fields.extended >         session prop >         > to hint what properties we will fetch. >         > >         > Let me make it clear that I am not refusing the change to a >         paging >         > model if that is what the majority rules. We should just >         make an >         > informed decision that we are sure we agree on. >         > > > > >         im proposing adding new api not breaking existing ones. The >         existing >         stuff can easily emulate paging if it lacks native support > >         I would prefer new api that takes a start point param and a >         count/length >         param sow e have full random access > > And how is GetHitData not good enough for that? > > Cheers, > Mikkel > </div></div></blockquote></div>