simple search api (was Re: mimetype standardisation by testsets)
jean-francois.dockes at wanadoo.fr
Thu Nov 23 20:33:35 EET 2006
mikkel.kamstrup at gmail.com (Mikkel Kamstrup Erlandsen) writes:
> magnus.bergman at observer.net (Magnus Bergman) writes:
> > One thing that English users seldom consider is the usages of several
> > languages. Which language is being used is important to know in order
> > to decide what stemming rules to use, and which stop-words use (in
> > English "the" is a stop-word while it in Swedish means tea and is
> > something that is adequate to search for). People using other languages
> > are very often multi lingual (using English as well). Therefore it is
> > interesting to know which language the query is in (search engines
> > might also be able to translate queries to search in document written
> > in different languages).
> This is a good point. However I suggest leaving this up to the actual
> implementations. After all it is an indexing time question what stemmer to
> use when indexing a document...
This is not true. An indexer can chose to perform stem processing at query
time. Recoll is one, but I don't think it's the only one. There are quite
good reasons to do so.
Recoll: desktop search for all Unix environments. http://www.recoll.org
More information about the xdg