simple search api (was Re: mimetype standardisation by testsets)

Tue Nov 28 07:53:32 EET 2006

2006/11/27, Magnus Bergman <magnus.bergman at observer.net>:
>
> On Sat, 25 Nov 2006 21:49:22 +0100
> "Mikkel Kamstrup Erlandsen" <mikkel.kamstrup at gmail.com> wrote:
>
> > 2006/11/24, Magnus Bergman < magnus.bergman at observer.net>:
> > >
> > > On Thu, 23 Nov 2006 16:26:31 +0100
> > > "Mikkel Kamstrup Erlandsen" <mikkel.kamstrup at gmail.com > wrote:
> > >
> > > > 2006/11/22, Magnus Bergman <magnus.bergman at observer.net>:
> > > >
> > > > > I have constructed a in-house application which does pretty much
> > > > > exactly what you describe (it doesn't yet speak dbus, but corba
> > > > > and soap). Sadly I'm not allowed to release the source of this
> > > > > application, but at least I can share some of my experience. (I
> > > > > haven't yet looked closely on your source, so I might have
> > > > > misunderstood some things)
> > > >
> > > >
> > > > Great! To paraphrase Linus "Given enough eyeballs all
> > > > <strike>bugs</strike> specs are shallow"  :-)
> > > >
> > > >
> > > > If several search engines are available, the search manager lets
> > > > the
> > > > > client know of each search engine according to your proposal
> > > > > (right?). I think it would be a better idea to present a list of
> > > > > indexes (of which each search engine might provide several) to
> > > > > search in, but by default search in all of them (if
> > > > > appropriate). I
> > > >
> > > > Well, the search engines are not obliged to use a particular index
> > > > format. The indexes them selves can be of any format.
> > >
> > > With "index" I mean an abstract reference to something considered an
> > > index by the backend. With the consequence that the user (or rather
> > > the application) only sees the indexes, not the engines that
> > > provides them (because that is not very important).
> >
> > Ok,  I'm with you now :-) It is  the same as the "group" switch of the
> > current draft on the wiki. Fx. searching for "magnus group:contacts"
> > searches only through the contacts "index". I'm very strongly in
> > favor of this, although some have spoken for putting this "grouping"
> > functionality on the client side. An example of client side grouping
> > could be a music application, where searching for "foo fighters"
> > would add "mime:audio/*" to the query before sending it. As I said
> > I'm not for client side grouping, a server side grouping could still
> > facilitate a client side grouping anyway.
>
> My idea of "index" was a more abstract alternative to "search engine"
> or "backend" (since several of those can run and their search results
> be merged I assume). If one single search engine/backend has several
> indexes I thought it could be of reasons like that the indexes are
> created by different users (one for each user and one for system files
> like man-pages perhaps) or reside on different computers. But this is
> probably beyond the scope of the simple interface, which should just
> trust that the appropriate indexes are searched (and that the
> appropriate search engines/backends are used).
>
I'm not sure I understand exactly what "group" means the draft. It is
> rather some predefined or user defined categories files are sorted
> under automatically or manually by the user. Some kind of tags to
> categorize data?
>
> > > > > Daemon or no daemon, that is the question. This is a question
> > > > > that without doubt will arise (it always does). First we need
> > > > > to clarify that there is a difference between a daemon doing
> > > > > the indexing of document (or rather detecting new documents
> > > > > needed to be indexed) and a daemon performing the search (and
> > > > > possibly merging several searches). Most search engines I use
> > > > > don't have a daemon for doing the searches (instead the only
> > > > > provide a library), because that is seldom considered required.
> > > > > Indexes are read only (then searching) so the common problems
> > > > > daemons are used to solve are not present.
> > > >
> > > > The situation at hand is that we have a  handful of desktop search
> > > > engines, all implemented as daemons, both handling searches and
> > > > indexing. Having an extra daemon on top of that handling the query
> > > > one extra time before passing it to the search subsystem seems
> > > > overkill... Ideally I see the daemon/lib (or even executable) to
> > > > only be used as a means of obtaining a dbus object path given a
> > > > dbus interface name ("org.freedesktop.search.simple").
> > >
> > > I have some experience of search engines in general (and I have no
> > > idea in what way a "desktop search engine" is different). And to my
> > > knowledge the majority does not have a daemon performing the
> > > searches, rather a library. They might have a daemon doing the
> > > indexing (and detecting new documents), but that's not the same
> > > thing.
> >
> > Well, having a lib with no daemon associated used for searching is
> > possible, there's still response time  to consider. I don't know
> > about other indexers but I would hate to create a new Lucene
> > IndexSearcher for each app that want to do searches, this is a costly
> > affair timewise and memorywise. A daemon holding a singleton
> > IndexSearcher (or managed pool) can be more resource friendly here.
>
> I assume the situation is pretty much the same as with (SQL) databases.
> For example MySQL works fine as a daemon and embedded as a library too.
> But in this case it is less complicated since the library only reads
> from the index(es). I don't have much experience with lucene (it was
> years since I even looked at it). So I'm sorry I don't know creating a
> new Lucene IndexSearcher involves. But I assume it means initiating the
> engine. So it's obviously faster to do that only once (which is the
> same with MySQL, still people insist on embedding it). So my suggestion
> is this:
>
> 1 Applications can talk with the daemon using the protocol directly like
> this:
>
> ,-------------,
> | Application |
> `----.--------'
>       |
>    protocol
>       |
>   ,---^-----.
>   | Daemon  |
>   >---------<
>   | library |
>   >---------<
>   | backend |
>   | plugin  |
>   `---------'
>
> 2 Applications can use the library to either communicate with the
> daemon or loading the backend plugin directly, like this
>
>   ,---------------------.
>   |     Application     |
>   >---------------------<
>   |       library       |
>   >--------.-.----------<
>   | engine | | protocol |
>   | plugin | | plugin   |
>   >--------< `----.-----'
>   | search |      |
>   | engine |   protocol
>   `--------'      |
>               ,---^----,
>               | daemon |
>               `--------'
>
> If the library is used, then the library decides which "path" to take
> to each index (if using the daemon to a certain backend if more
> efficient, then the daemon will be used if it's available). And Since
> the daemon uses the very same library to to the very same thing it must
> of course be smart enough to not create an infinite loop by contacting
> itself (I'll explain that in more detail if required).
>
> > Has anyone thought about having a general purpose naming service based
> > > on dbus and avahi (like CORBAs CosNaming)? Or is there already
> > > something like that, that I have missed?
> >
> > I believe you are asking about dbus activation?
> > http://raphael.slinckx.net/blog/documents/dbus-tutorial/
> > I don't know what CosNaming is about...
>
> No, not really. I was thinking the other way around. When services
> becomes available the register themselves with a naming service,
> telling what service it is they provide and how to find them. In
> other words what avahi (dns-sd) does. But not requiring the the service
> to have a IANA registered TCP port. And without the text length
> limitation of dns-sd.
>
> > > As you point out, having a separate daemon other than the indexer,
> > > is
> > > > not exactly standard (atleast not to my knowledge). Also a
> > > > managing daemon is likely to re-invent functionality dbus already
> > > > provides IMHO.
> > >
> > > That might very well be the case. As I mentioned I have thou
> > > implemented a (in-house) daemon doing this. But it's usage is
> > > mainly to cache searched made through a web-interface (which has to
> > > be stateless since there are several web-servers sharing the load).
> > >
> > > But is your idea not to use dbus at all then (except for finding the
> > > search services), but a library instead?
> >
> > The idea is to have the indexer/search engine expose the wasabi api
> > over dbus.
>
> But the wasabi api itself does not use dbus, right? It is rather a
> library wrapping some sort of ipc mechanism provided be the search
> engine?

Well, actually the plan was to be dbus only. It is platform indepedent and
the de facto IPC mechanism on the free desktop. Furthermore it provides both
sync and async versions of each method.

Long term I see no problem in providing the api via an ordinary lib, but I
don't think it is anything we need to think through right now. If we keep it
in the back of our heads and make sure we don't burn the bridges, I think we
will reach our goal easier.

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20061128/04944a46/attachment.htm