[Xesam] Why is vendor.maxhits read-only?

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Wed Dec 19 14:09:51 PST 2007


On 19/12/2007, Joe Shaw <joe at joeshaw.org> wrote:
> Hi,
>
> On 12/19/07, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com> wrote:
> > I think the point is that I don't understand is why you need to
> > reconstruct the entire Document. I don't think Lucene is supposed to
> > be used like that. Lucene's sorting by a non-tokenized field should be
> > über fast.
>
> Are you talking about Lucene's FieldSelector?  That allows you to
> create a Document instance with only some of the fields loaded off
> disk.  If so, that's a pretty new feature of Lucene (within the last
> year) and one which hasn't made it into the .Net version yet.  We did
> forward port it, though, and we use it in a few places.  We don't use
> it in the timestamp case, actually, because filters can do additional
> checks on other properties and reject them as results (imagine a file
> which doesn't exist on disk but is still in the index for whatever
> reason).  But that's mostly an implementation detail.
>
> If you're talking about iterating across terms in the Lucene index
> using TermEnum, that is something we do.  Walking the terms within a
> field is sorted, and it's about 2.5x faster than building a document,
> from my profiling.  But if you have a million documents in your index
> and only 5000 matches for a given query, it's faster to build the
> documents for all 5000 matches and keep the top 100 than it is to walk
> across all 1 million terms (although you would short circuit if you
> hit the 5000th match earlier than the 1 millionth document.)



More information about the Xesam mailing list