simple search api (was Re: mimetype standardisation by testsets)

Jean-Francois Dockes jean-francois.dockes at
Thu Nov 23 20:33:35 EET 2006

mikkel.kamstrup at (Mikkel Kamstrup Erlandsen) writes:
> magnus.bergman at (Magnus Bergman) writes:
> > One thing that English users seldom consider is the usages of several
> > languages. Which language is being used is important to know in order
> > to decide what stemming rules to use, and which stop-words use (in
> > English "the" is a stop-word while it in Swedish means tea and is
> > something that is adequate to search for). People using other languages
> > are very often multi lingual (using English as well). Therefore it is
> > interesting to know which language the query is in (search engines
> > might also be able to translate queries to search in document written
> > in different languages).
> This is a good point. However I suggest leaving this up to the actual
> implementations. After all it is an indexing time question what stemmer to
> use when indexing a document...

This is not true. An indexer can chose to perform stem processing at query
time. Recoll is one, but I don't think it's the only one. There are quite
good reasons to do so.


