2006/11/20, Jos van den Oever <<a href="mailto:jvdoever@gmail.com">jvdoever@gmail.com</a>>:<div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> 2006/11/20, Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>>: > > This notion of groups is very valuable for a nice user interface. It > > is however not relevant for the simplest form of search engine. The > > group designation of a file is usually not stored directly in the > > database, but inferred over the mimetype. For complex groups the query > > might look something like (application/msword OR application/pdf OR > > ...). Making such a list part of a search API makes it hard to agree > > on the mimetypes. I do not oppose a wrapper API the knows about the > > groups and expands a group-enabled-query, but I dont think we should > > put this in the simple API. The group(s) to which a file belongs is > > just another type of (inferred) metadata and i dont think we should > > treat is specially. > > Given that it would be part of the search language it cannot be ruled out of > the simple api, unless we restrict the simple api to only support a subset > of the query language (which I don't think is a good idea). Another generalization one usually make is that default search fields are used. How do we define those, do they depend on document type or group? I'd prefer the query to be as specific as possible, but I dont expect the user to have to type a specific query. The application expands the query to one that fits in the context. > It could be introspectable which switches was supported in the language, > such as a GetSupportedQuerySwitches(out as), but that doesn't seem to fit in > a "simple" api. > > Also what about items that don't have a mimetype as such, conversations, > emails, attachment, contacts, etc. How would an application search my > Contacts for "Jos"? If this called for an advanced api, that seems strage..? Each indexed object must have an identifier, a uri, that points to it and that can be interpreted. If you look in files this is easy. If you look for contacts, you'll need to have a different url. You can match on this url to specify a subset of data to search in. E.g. something like this (oversimplified) 'path:urn://contact/*'. The API defined so far returns uris for results. This is an important point. Not the resulting objects are returned but a pointer to them. > My concern is that we limit the simple api too much to be of any real value. Lets hope not! To recoup, essentially we've not added functions or changed anything significant yet. Only the get/showConfiguration change. Am I correct that so far you've been swayed by my arguments? If not please repeat the problematic points. Also I hope others are reading this too. Dont want to end up with a two-man-standard.</blockquote><div> That would be a darn shame :-) Give me until tomorrow to give this api some hard thinking. I also think we should personally email all the maintainers of any framework we can come up with, and set a response deadline within a week or so..?  </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">A point I forgot in the first API: what about returning text fragments that show the matches in the documents?</blockquote><div> That is certainly a handy feature (and utilised in most search tools nowadays), so it might prove worthwhile to add. The question is whether it should take an array of uris or only a single uri... </div> <div> Cheers, Mikkel </div> </div>