Shared documentation system

Wed Dec 10 04:02:13 EET 2003

On Mon, 8 Dec 2003 19:56:38 +0100 (CET)
Claes Holmerson <claes at it-slav.net> wrote:

> I like this idea a lot. I was thinking something similar recently. Let
> me add some thoughts to the subject, from a slightly different angle.
> 
> As start menus become more and more crowded, I think it is time to
> take a step back and consider if there are any better ways to find and
> launch programs. Instead of hierarchical presentation, I think a
> search feature would be very useful, perhaps combined with some
> history which keeps track of recent and/or most used programs. Compare
> for example Google with Yahoo's hierachical index to see which is most
> successful :-)

I am very against the "start menu", overcrowded menus are bad and the
start menu is the most overcrowded of them all. But I still think that a
hierarchical presentation is good. It used to work really great in CDE.
Wouldn't Yahoo beat Google any day if every web page was in the
hierarchy? But of course a search function could be a useful complement.

> My vision is that searching among installed programs should result in
> a result page somewhat looking like a freshmeat search. A more
> detailed description of the program, together with ways to start it,
> read more about it in its documentation, a link to it on the web for
> example. In order to do this, .desktop files need to hold more
> metadata about their programs .

Isn't the easiest way to accomplish this to have a link from the
documentation to the executable (like man has)?

> This is where this documentation proposal comes in so well. If there
> was a way to find relevant documentation from a .desktop file, that
> documentation could be indexed as belonging to this .desktop file, and
> that would allow much more text to be indexed.
> 
> For fun, I experimented with indexing all the .desktop-files I could
> find on my system (Suse 8.2). For this I used the Lucene indexer and
> search engine, which is part of the Apache Jakarta project
> (http://jakarta.apache.org/lucene). Lucene works with the concept of
> "documents", which are filled with "fields" containing the searchable
> data, and then stored in an index. Queries to the index will return
> hits which refers to the documents that were put in it. Lucene
> includes a sofisticated query parser, and this combined with the
> ability to search in a combination of fields makes it pretty powerful.
> Normal search engine syntax, such as AND, OR, NOT, as well as prefix
> operators such as + and - are supported.
> 
> Lucene is a Java library, and not an ideal dependency for the desktop,
> but it is popular enough to have many ports in progress. At
> Sourceforge, there are a number of porting projects, to Python and C++
> among others. For my prototype, this did not matter. My goal was more
> to investigate whether.desktop files contain enough information to
> build a useful index. Note that Lucene is not a web crawler, or  web
> search engine. Nothing in it ties it to the web, and it can easily be
> used to index files in a file system. Another useful feature is to
> index the user's documents, but that is a different issue.

I have looked at every indexing engine I could find (including two
commercial ones). The most interesting ones I found was swich-e and
mnogosearch, they are both GPL and written i C.

> My idea was to create a lucene document for every .desktop file. In
> each lucene document I stored the path to the .desktop file, which
> makes this path available in each hit. There is not a huge amount of
> data in a.desktop file that makes sense to index. Name, GenericName,
> Comment and Categories are the ones that are obvious. They are the
> only ones that contain text that the user is likely to search for.  I
> also looked up the mime type description for each MimeType from the
> mime definitions in freedesktop.org.xml and indexed that in the
> document too, in the cases mimetypes were specified.
> 
> After this, I searched against the index. It worked ok, but not great.
> In many cases grep for the same terms would give approximately the
> same results. A big problem I think is that .desktop files does not
> include that much of human readable information. The comments are
> designed to be shown in brief tooltips, and more information about the
> programs is not readily available from the .desktop file itself. A
> simple example: a search for "mp3" resulted in far fewer hits than I
> expected. The reason is that many mp3-capable players only describe
> themselves as media players, and lookup of mimetype "audio/x-mp3" in
> freedesktop.org.xml results in "MPEG layer 3 audio" rather than "MP3
> audio" or similar. That is strictly speaking correct, but not as
> likely to be searched for. With more text to index for each program, I
> believe the results would improve. The documentation is likely to
> mention mp3 I think :-)

Using an index instead of grepping is more a question of speed rather
that accuracy. But the accuracy could be improved by grouping synonyms
and closely related terms together. That way casual terms will match
more technical ones then searching, without the need of more data to
index.

> With this proposal, if there was a way to simply find the relevant
> documentation for a .desktop file, indexing would be much more useful.
> I was also thinking about adding documentation metadata to the
> .desktop file itself, but .desktop file format is not well suited to
> having lots of readable text in it. I also agree with the "nesting
> problem" regarding.desktop files.
> 
> Indexing documentation for its own purpose is a good idea too. We can
> imagine at least three kinds of searches:
> 
> Search for programs
> Search in documentation
> Search in user files.
> 
> Of these, at least the first two should be considered in the same
> context.