Xesam meta-meta-data spec needs attention.

Sat May 12 05:47:50 PDT 2007

On Saturday 12 May 2007 13:23:59 Mikkel Kamstrup Erlandsen wrote:
> 2007/5/12, Fabrice Colin <fabrice.colin at gmail.com>:
> > Hi all,
> >
> > On 5/12/07, Joe Shaw <joe at joeshaw.org> wrote:
> > > I haven't been following this thread super closely.  Why define these
> > > in .desktop-like files rather than in some sort of documented
> > > specification?  Code is what ultimately will be setting these, so it
> > > will have to obey them.
> >
> > I agree.
> >
> > I am not sure I understand the benefit of defining these in some sort
> > of user-editable configuration, instead of in a spec.
> > If the user defines a new field, it won't have any effect as the engine
> > has
> > no way to automagically know how that new field maps to the underlying
> > file format. The corresponding metadata extractor will have to be updated
> > to support the new field and make sure it is retrieved from files.
>
> It was not the idea that an ordinary user should install field definitions.
> Applications with special needs could do so, but most wouldn't need to. Do
> I understand correctly in that you don't see the need to have the ontology
> defined in a machine readable way? Just specced out in some document?

I believe the idea was that xesam-core-lib(or whatever) comes with a 
hard-coded onto.

In fact, If app knows how to deal with a particular field, it won't be hard 
for the said app to register/create the field via xesam-core API.

Having xesam-core define our core onto is what is going to enforce the 
standard.

Should somebody want to viz the onto, they can do a RDF dump and use any of 
RDF viz tools. The code to dump RDF+XML representation is very small(Strigi 
does that already).

The only problem with this approach is localization. All localization strings 
have to be hardcoded internally or provided via external file.
We have 3 localizable values for each field: human-readable name, 
human-readable description(e.g. for tooltips), controlled-vocabulary list of 
values.

One of possible solutions is the use of .desktop files to provide only 
localization info, and define the onto structure via API.

Our current approach in Strigi is that analyzers request fields they want to 
use. Onto is maintaned internally. Should the need arise, it can be dumped as 
either RDF+XML or .dot. Localization info is read from .desktop-like files.

> While 
> this could be done, the machine readable ontology does have quite a few
> benefits. Fx:
>
>  * You could update the ontology without updating any applications or
> search engine code

It is unlikely that onto will be updated without app updates. If you need to 
update xesam onto, this means we either made a mistake or omission. I doubt 
this will happen too often due to app reliance on a specific onto structure.

Updating of localization strings seems to be a more frequent event.

I'd like a feedback on this: how localization process is going to affect us.

>  * Applications could create dynamic guis that reflects the ontology

You seemed to expect xesam-core to parse onto files and provide an API. So 
for apps using API this doesn't matter.

>  * 3rd parties could extend the ontology by installing their own ones

The same can be done by extending the onto via API.

> Ok let me explain why I personally prefer the .desktop like approach given
> that we install the ontology on the hard drive in a machine readable way...
>
> 1) It doesn't introduce dependencies on new 3rd party libs (maybe it does
> for qt/kde I'm uncertain on their situation). GLib has really good support
> for .desktop files (with i18n which we need too) in what is known as the
> keyfile api. A rdf parser is likely to require either a good deal of code
> in libxesam or a 3rd party lib. 3rd party libs are a big deal we shouldn't
> accept with out a great deal of thought.

A correction: a fully-featured N3 derivative parser is a good deal of code. 
However, If we limit ourselves to a simple subset without named graphs, 
nesting etc, it's a trivial piece of code.

In fact, Strigi doesn't use any external libs to read .desktop format, and 
code is 1-2 pages long depending on how you look at it. RDF N3 subset markup 
has the same complexity.

If at any time we decide we need the full-blown N3 parser, we just flip libs.

Glib on the other side is the lib that Strigi has no intent to link, just like 
any KDE lib. This is due to several usage scenarios and Strigi being 
desktop-agnostic as such.

> 2) Application developers can easily create their own ontologies. Anybody
> can understand the .desktop approach by looking at one or to field
> definitions. That is not necessarily the case with rdf.
>
> 3) Although RDF has (relatively) simple representations it might scare
> developers of by pure reputation.

We don't have to tell them, it's RDF ;) and we shouldn't since the initial 
implementation is going to be limited ( like we have RDF, but you can't use 
it :)

> 4) .desktop is already a de facto standard on the desktop (and xesam is all
> about the desktop). RDF is a standard, but it's not greatly used in desktop
> applications.

> 5) RDF is extensible to just about any point imagnable, while this is
> normally good, I think it would be healthy to restrict our selves to some
> extent, so that we don't fly of into abstraction space.

RDF is being increasingly used on linux desktop. Btw, as you remember, Jamie 
has already figured out how to stretch .desktop to have equivalent to RDF 
functionality. And it wasn't a far stretch. If you want to enfoce some 
limits, it's best done on API level.

> 6) If it turns out eventually that .desktop is too simple, it would be
> possible to allow rdf ontologies as well. It would not be the most
> beautiful solution, but I think it is acceptable.

--Evgeny