Shared documentation system

Claes Holmerson claes at it-slav.net
Mon Dec 8 20:56:38 EET 2003


I like this idea a lot. I was thinking something similar recently. Let me
add some thoughts to the subject, from a slightly different angle.

As start menus become more and more crowded, I think it is time to take a
step back and consider if there are any better ways to find and launch
programs. Instead of hierarchical presentation, I think a search feature
would be very useful, perhaps combined with some history which keeps track
of recent and/or most used programs. Compare for example Google with
Yahoo's hierachical index to see which is most successful :-)

My vision is that searching among installed programs should result in a
result page somewhat looking like a freshmeat search. A more detailed
description of the program, together with ways to start it, read more
about it in its documentation, a link to it on the web for example. In
order to do this, .desktop files need to hold more metadata about their
programs .

This is where this documentation proposal comes in so well. If there was a
way to find relevant documentation from a .desktop file, that
documentation could be indexed as belonging to this .desktop file, and
that would allow much more text to be indexed.

For fun, I experimented with indexing all the .desktop-files I could find
on my system (Suse 8.2). For this I used the Lucene indexer and search
engine, which is part of the Apache Jakarta project
(http://jakarta.apache.org/lucene). Lucene works with the concept of
"documents", which are filled with "fields" containing the searchable
data, and then stored in an index. Queries to the index will return hits
which refers to the documents that were put in it. Lucene includes a
sofisticated query parser, and this combined with the ability to search in
a combination of fields makes it pretty powerful. Normal search engine
syntax, such as AND, OR, NOT, as well as prefix operators such as + and -
are supported.

Lucene is a Java library, and not an ideal dependency for the desktop, but
it is popular enough to have many ports in progress. At Sourceforge, there
are a number of porting projects, to Python and C++ among others. For my
prototype, this did not matter. My goal was more to investigate whether
.desktop files contain enough information to build a useful index. Note
that Lucene is not a web crawler, or  web search engine. Nothing in it
ties it to the web, and it can easily be used to index files in a file
system. Another useful feature is to index the user's documents, but that
is a different issue.

My idea was to create a lucene document for every .desktop file. In each
lucene document I stored the path to the .desktop file, which makes this
path available in each hit. There is not a huge amount of data in a
.desktop file that makes sense to index. Name, GenericName, Comment and
Categories are the ones that are obvious. They are the only ones that
contain text that the user is likely to search for.  I also looked up the
mime type description for each MimeType from the mime definitions in
freedesktop.org.xml and indexed that in the document too, in the cases
mimetypes were specified.

After this, I searched against the index. It worked ok, but not great. In
many cases grep for the same terms would give approximately the same
results. A big problem I think is that .desktop files does not include
that much of human readable information. The comments are designed to be
shown in brief tooltips, and more information about the programs is not
readily available from the .desktop file itself. A simple example: a
search for "mp3" resulted in far fewer hits than I expected. The reason is
that many mp3-capable players only describe themselves as media
players, and lookup of mimetype "audio/x-mp3" in
freedesktop.org.xml results in "MPEG layer 3 audio" rather than "MP3
audio" or similar. That is strictly speaking correct, but not as likely
to be searched for. With more text to index for each program, I believe
the results would improve. The documentation is likely to mention mp3 I
think :-)

With this proposal, if there was a way to simply find the relevant
documentation for a .desktop file, indexing would be much more useful. I
was also thinking about adding documentation metadata to the .desktop file
itself, but .desktop file format is not well suited to having lots of
readable text in it. I also agree with the "nesting problem" regarding
.desktop files.

Indexing documentation for its own purpose is a good idea too. We can
imagine at least three kinds of searches:

Search for programs
Search in documentation
Search in user files.

Of these, at least the first two should be considered in the same context.

Claes




More information about the xdg mailing list