2006/11/23, Jean-Francois Dockes <<a href="mailto:jean-francois.dockes@wanadoo.fr">jean-francois.dockes@wanadoo.fr</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
mikkel.kamstrup at <a href="http://gmail.com">gmail.com</a> (Mikkel Kamstrup Erlandsen) writes:<br>> magnus.bergman at <a href="http://observer.net">observer.net</a> (Magnus Bergman) writes:<br>> > One thing that English users seldom consider is the usages of several
<br>> > languages. Which language is being used is important to know in order<br>> > to decide what stemming rules to use, and which stop-words use (in<br>> > English "the" is a stop-word while it in Swedish means tea and is
<br>> > something that is adequate to search for). People using other languages<br>> > are very often multi lingual (using English as well). Therefore it is<br>> > interesting to know which language the query is in (search engines
<br>> > might also be able to translate queries to search in document written<br>> > in different languages).<br>><br>> This is a good point. However I suggest leaving this up to the actual<br>> implementations. After all it is an indexing time question what stemmer to
<br>> use when indexing a document...<br><br>This is not true. An indexer can chose to perform stem processing at query<br>time. Recoll is one, but I don't think it's the only one. There are quite<br>good reasons to do so.
</blockquote><div><br>Right. In my sleepy haze last night I was not thinking straight :-) I've put some more detail in my answer to Fabrice's post.<br><br>Cheers,<br>Mikkel<br></div><br></div>