simple search api (was Re: mimetype standardisation by testsets)

Mon Nov 20 23:01:59 EET 2006

2006/11/20, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> > This notion of groups is very valuable for a nice user interface. It
> > is however not relevant for the simplest form of search engine. The
> > group designation of a file is usually not stored directly in the
> > database, but inferred over the mimetype. For complex groups the query
> > might look something like (application/msword OR application/pdf OR
> > ...). Making such a list part of a search API makes it hard to agree
> > on the mimetypes. I do not oppose a wrapper API the knows about the
> > groups and expands a group-enabled-query, but I dont think we should
> > put this in the simple API. The group(s) to which a file belongs is
> > just another type of (inferred) metadata and i dont think we should
> > treat is specially.
>
> Given that it would be part of the search language it cannot be ruled out of
> the simple api, unless we restrict the simple api to only support a subset
> of the query language (which I don't think is a good idea).
Another generalization one usually make is that default search fields
are used. How do we define those, do they depend on document type or
group?
I'd prefer the query to be as specific as possible, but I dont expect
the user to have to type a specific query. The application expands the
query to one that fits in the context.

> It could be introspectable which switches was supported in the language,
> such as a GetSupportedQuerySwitches(out as), but that doesn't seem to fit in
> a "simple" api.
>
> Also what about items that don't have a mimetype as such, conversations,
> emails, attachment, contacts, etc. How would an application search my
> Contacts for "Jos"? If this called for an advanced api, that seems strage..?
Each indexed object must have an identifier, a uri, that points to it
and that can be interpreted. If you look in files this is easy. If you
look for contacts, you'll need to have a different url. You can match
on this url to specify a subset of data to search in. E.g. something
like this (oversimplified) 'path:urn://contact/*'.

The API defined so far returns uris for results. This is an important
point. Not the resulting objects are returned but a pointer to them.

> My concern is that we limit the simple api too much to be of any real value.
Lets hope not!
To recoup, essentially we've not added functions or changed anything
significant yet. Only the get/showConfiguration change. Am I correct
that so far you've been swayed by my arguments? If not please repeat
the problematic points.

Also I hope others are reading this too. Dont want to end up with a
two-man-standard.

A point I forgot in the first API: what about returning text fragments
that show the matches in the documents?

Cheers,
Jos