Simple search API proposal, take 2
Mikkel Kamstrup Erlandsen
mikkel.kamstrup at gmail.com
Wed Jan 10 07:25:53 PST 2007
2007/1/4, Magnus Bergman <magnus.bergman at observer.net>:
> First some comments on the current draft[1]
> """""""""""""""""""""""""""""""""""""""""""
>
> I think it's a bad idea to use a query-string to identify a search for
> the following reasons:
> * It is inefficient to sent a (possibly quite long) string for every
> call.
> * It isn't logical for the search engine to use the query string to
> lookup the search because a query might generate a different result
> depending on then the search is started.
> * An application might create different searches from the same query
> (string) with different result ("all files created this minute").
>
> Because of these reasons I propose to provide a *search handle*
> (probably just an integer value) for each search that is created.
>
> From what I read in the discussion it seems problematic to use URIs
> as persistant identifiers to identify a hit. Because of the reasons
> already mentioned and because a hit is not the same thing as a
> document. Even if a URI was a persistant identifier for a document, it
> would be illogical to use it to identify a hit. And because of this
> and the reasons mentioned above it would be even worse to use a query
> string and a URI to identify a hit.
>
> Instead I support the idea of simply using sequence numbers (and a
> search handle) to identify a hit.
>
>
>
> Highlighting, streaming and snippets
> """"""""""""""""""""""""""""""""""""
>
> It isn't clear what a snippet is exactly. But my guess is that it is a
> selected part or summary of the document that especially well
> demonstrate why it matched, possibly with highlighting. And it isn't
> stored in the index but dynamically generated. Correct?
>
> I have brought up the question about a need for a document streaming
> infrastructure. But now I see that highlighting is to be supported,
> so document streaming seems to be needed anyway.
>
> The highlighting can not be done by the application, it must be done
> by the search engine. Just highlighting every word from the query
> string isn't correct. The knowledge from search engine is needed to
> get it right. This means that to highlight a document (or a selected
> part of it) there is no other way to do it that to stream the
> document though the search engine to the application.
>
> If snippets are going to be supported it will be easy to also support
> delivering the whole document highlighted, and even easier to just
> deliver the whole document.
>
> Streaming the document means to automatically convert it into a
> requested format (something that the indexer can extract words from
> or something that an application can show). Doing this is actually no
> big deal, doing the highlighting is the hard part.
>
> The benefit of being able to stream documents like this is that the
> documents doesn't need to be accessible in a way an application can
> understand (they are not required to have a URI).
>
> I don't say this is a feature we can't live without. But we
> practically get it for free if snippets are going to be supported.
>
>
>
> Properties for hits
> """""""""""""""""""
>
> Hits are not the same thing as documents, so these are really both
> properties of the hits and properties of the document. The properties
> of the hits include information on why the document matched the query
> and link to the matching document. This link might be kept secret by
> the search engine, but a URI might be provided as a property of the
> document. The properties of the document are of course the usual
> document meta data. Some of these might be stored in the search
> engines index, some might be extracted from the document dynamically,
> but that doesn't matter. The properties belonging to the document (as
> well as the document itself) can be accessed independently of a
> search, the ones belonging to the hit can not.
>
>
>
> The actual proposal
> """""""""""""""""""
>
> ShowConfiguration ( )
>
> Open a graphical interface for configuring the search tool.
>
>
> NewSearch ( in s query , out i search )
>
> Start a new search from a query string.
> * query: The query string to execute.
> * search: A handle that is used to uniquely identify this search.
>
>
> CountHits ( in i search , out i count )
>
> Count the number of hits from a particular search. Used for paging
> and suggestion popups with hit counts.
> * search: A handle that is used to uniquely identify a search.
> * count: The number of hits from this search.
>
>
> GetHitProperties ( in i search, in i offset, in i limit,
> in as properties, out a{sa{sas}} response )
>
> Get properties for the given hits. URIs and snippets are just
> properties.
> * search: A handle that is used to uniquely identify a search.
> * offset: The offset in the result list for the first returned
> result.
> * limit: The maximum number of results that should be returned.
> * properties: A list of properties to return. An empty list is a
> request for all properties.
> * response: A map mapping each hit (sequence number) to a map of
> property-list of values pairs.
>
>
>
> [1] http://wiki.freedesktop.org/wiki/WasabiSearchSimple
>
There has been general good feedback on Magnus proposal, so I updated
the wiki: http://wiki.freedesktop.org/wiki/WasabiSearchSimple
Cheers,
Mikkel
More information about the xdg
mailing list