[Wasabi Proposal] XML desktop query language

Jos van den Oever jvdoever at gmail.com
Wed Jan 17 13:21:23 PST 2007

2007/1/17, Jean-Francois Dockes <jean-francois.dockes at wanadoo.fr>:
> Jos van den Oever writes:
>  > 2007/1/17, Jean-Francois Dockes <jean-francois.dockes at wanadoo.fr>:
>  > > I think that the default should be to let the engine use stemming or not
>  > > (it will be ON in most cases). Users just expect it. It would be
>  > > inconvenient, for example, no to find plurals. Google does it, Wikipedia
>  > > does it, is there any example of an engine that has stemming off by
>  > > default ?
>  > >
>  > > The user will want to turn it off in specific cases, and a clever engine
>  > > may turn it off sometimes (ie: when searching an author field).
>  > >
>  > > Jos is right that this is a tricky area, but having to search for
>  > > (example OR examples) would come as a bit of a surprise for most
>  > > people.
>  >
>  > I get  the feeling that this is a point on which people will always
>  > disagree, so I'll just say that my experience is difference. If I want
>  > to look for both 'example' and 'examples', I use example*. You're
>  > right that since some time Google has enabled stemming, so I have to
>  > use '+example' quite often when I want to search. So we could vote on
>  > having it on per default or not.
> Ok, and I see that after checking, and to my surprise, neither Apple Search
> Kit nor MSN Search seem to use stemming, so I guess that we either need an
> interface to let the user set this kind of preference, or let the
> backend use its own default.
>  > Because stemming can be turned on and off, the search index should not
>  > stem the search terms stored in the index, but stem when searching.
>  > This is an important difference if we want to have the same results
>  > from the same queries.
> Well, there wouldn't be much point in having different backends if they
> returned the same answers ! Freedom to the free software authors ! :)
Of course there would be a point! Different file viewers or audio
players are supposed to produce the same output too. Search engines
are no exception. If we have a standard on the query format then we
are saying that the users question reaches the index in the same way,
so the answer should be the same too, unless the indexes have
different feature sets.

