simple search api (was Re: mimetype standardisation by testsets)

Mon Nov 20 23:10:36 EET 2006

2006/11/20, Jos van den Oever <jvdoever at gmail.com>:
>
> 2006/11/20, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> > > This notion of groups is very valuable for a nice user interface. It
> > > is however not relevant for the simplest form of search engine. The
> > > group designation of a file is usually not stored directly in the
> > > database, but inferred over the mimetype. For complex groups the query
> > > might look something like (application/msword OR application/pdf OR
> > > ...). Making such a list part of a search API makes it hard to agree
> > > on the mimetypes. I do not oppose a wrapper API the knows about the
> > > groups and expands a group-enabled-query, but I dont think we should
> > > put this in the simple API. The group(s) to which a file belongs is
> > > just another type of (inferred) metadata and i dont think we should
> > > treat is specially.
> >
> > Given that it would be part of the search language it cannot be ruled
> out of
> > the simple api, unless we restrict the simple api to only support a
> subset
> > of the query language (which I don't think is a good idea).
> Another generalization one usually make is that default search fields
> are used. How do we define those, do they depend on document type or
> group?
> I'd prefer the query to be as specific as possible, but I dont expect
> the user to have to type a specific query. The application expands the
> query to one that fits in the context.
>
> > It could be introspectable which switches was supported in the language,
> > such as a GetSupportedQuerySwitches(out as), but that doesn't seem to
> fit in
> > a "simple" api.
> >
> > Also what about items that don't have a mimetype as such, conversations,
> > emails, attachment, contacts, etc. How would an application search my
> > Contacts for "Jos"? If this called for an advanced api, that seems
> strage..?
> Each indexed object must have an identifier, a uri, that points to it
> and that can be interpreted. If you look in files this is easy. If you
> look for contacts, you'll need to have a different url. You can match
> on this url to specify a subset of data to search in. E.g. something
> like this (oversimplified) 'path:urn://contact/*'.
>
> The API defined so far returns uris for results. This is an important
> point. Not the resulting objects are returned but a pointer to them.
>
> > My concern is that we limit the simple api too much to be of any real
> value.
> Lets hope not!
> To recoup, essentially we've not added functions or changed anything
> significant yet. Only the get/showConfiguration change. Am I correct
> that so far you've been swayed by my arguments? If not please repeat
> the problematic points.
>
> Also I hope others are reading this too. Dont want to end up with a
> two-man-standard.

That would be a darn shame :-) Give me until tomorrow to give this api some
hard thinking. I also think we should personally email all the maintainers
of any framework we can come up with, and set a response deadline within a
week or so..?

A point I forgot in the first API: what about returning text fragments
> that show the matches in the documents?

That is certainly a handy feature (and utilised in most search tools
nowadays), so it might prove worthwhile to add. The question is whether it
should take an array of uris or only a single uri...

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20061120/5627eee0/attachment.htm