[Xesam] dumping NID3/NEXIF for NMM

Tue Jun 16 06:48:45 PDT 2009

Hello Xesamies,

My comments within the text.

Urho Konttori pisze:

<snip>

>> I would say we agree that we continue to maintain both NID3, NEXIF, take 
>> up NMM into the xesam/oscaf standardization track now, and we can give 
>> feedback about NMM.
>> Once all of us (not only you) clearly see that NMM is better than NID3 
>> and NEXIF, we can think about dropping something.
>>   
> Sure, it needs to be common understanding, so bringing issues to the 
> open is very important.
>> I want to shed some light on the details of NID3 or NEXIF, I think your 
>> arguments would be different if you knew them.
>> NID3 is integrated with NCO - the artists are represented as contacts - 
>> so they ARE integrated with the "semantic desktop".
>> also Evgeny contributed to them, so saying "they would better never have 
>> existed" is saying that we wasted our time, which I think we did not :-)
>>   
> I know Evgeny contributed to them, but also know that he wanted to do 
> the things differently. You can see his approach in the xesam ontology.
> http://www.xesam.org/main/XesamOntology95
> 
> What I did, was, I combined the ontologies and reshaped them into a less 
> plain structure.
> 
>> we took standards that worked very well for the it industry for the last 
>> years, this is the common process when designing an ontology: look for 
>> existing standards and copy them.
>>   
> Which is not always the right choice. This is also obvious in the 
> calendar ontology. You guys took the specs of ical too literally. Ivan 
> is working on a proposal that drops all of the union classes and 
> replaces them with a single superclass, which makes the ncal a nice and 
> clean structure. I mean, ncal is for the most part really nice, but the 
> drive to copy the ical exactly lead to those horrible union classes.
> 

My 2c. The road towards the present ncal had two stages

1. Dan Connolly and the people from the www-rdf-calendar community at
w3c devised a python+xslt script that generates the icaltzd ontology
from the plain text file of the RFC itself. You can't get any nearer to
the original standard definition.
2. We took the Connolly's ontology and translated it to NRL. First with
an java program, then tweaked the result manually, all of this is
documented in [1]. The union classes appeared as a NRL equivalent of the
owl:unionOf construct present in the original OWL ontology.

If you have an idea for a third stage that will make it better, easier,
and still manage to express most of the information from ICAL files
without loss, I personally couldn't agree more.

The most important goal of the union classes was validation, if a
property can appear on an Event, but can't on a Journal entry, spotting
it on a journal entry means we have a bug in the ICAL->NCAL converter.
The conversion process itself is quite error-prone. Every level of
validation was welcome.

The union classes weren't considered a problem because the converter
didn't have to generate them, and they never come up in the data seen by
the user (that is the application used by the user). Nobody needs to
write code that 'understands' them. Could you elaborate more on the
problems you have with those union classes apart from them being 'ugly'?

>> Look, even the W3C people did  base their calendar ontology on the vCal 
>> standard, do not shoot at this approach, it is a good one. Consider the 
>> time it took to make MPEG7 (by the way, you may think about porting that 
>> one to an ontology, at least partly, instead of rolling your own) - we 
>> did consider this and say: invest time into what we need, not endless 
>> standardization discussions that others did before you.
>>   
> Sure. And mpeg7 is also a very nice example on how nid3 is not a good idea.

Allow me not to agree with that. I've spent two weeks in February 2007
trying to come to grips with MPEG7. My private conclusion was that MPEG7
is a specification bloated beyond anything I'd seen. The initial idea
was to take the MPEG7 ontology developed at DFKI for the SmartWeb
project and adapt it for nepomuk. After two weeks of banging my head
against the wall I dropped it and settled on NEXIF/NID3. The two most
important disadvantages of MPEG7 were

- many intermediary rdf nodes needed to express a simple thing. E.g.
saying that mp3 file has a composer whose name is "Smith" took something
around 5 or 6 rdf triples, whereas in NID3 it takes two.
- trying to cover all levels of abstraction, from basic technical
metadata to the semantic meaning like "this part of a picture is a face
of a person". If you're in for simplicity (which seems to be the case)
mpeg7 would be a very bad choice. Please correct me if I'm wrong.

> Ok, let me put it this way, we can support tens of different 'copy' 
> ontologies, as long as we also provide an abstraction ontology that 
> makes the use of the data easy. Think about it. mpeg7 will have totally 
> different names for the same fields that are in nid3, ogg ontology would 
> also have different names and so forth. Now, for a media player, you 
> have additional libraries (e.g. gstreamer) that handle the hassle of 
> playback for you. You don't need to care about the format at all. Now, 
> when you are showing the available music on an application window, you 
> don't want to query for the metadata from different ontologies. You want 
> one, that combines the most common features of various audio file types, 
> various video file types and various image file types.

The basic idea was to use the nid3 ontology for all audio metadata. It
seems that the 'ID3' in the ontology name may be misleading. My
intention was to extend it with properties beyond the id3 standard if
need be. There was supposed to be no 'ogg' ontology, if ogg files
contain metadata fields that have no direct mappings in id3 - the nid3
should be extended. Perhaps we should have named it something along the
lines of Nepomuk Audio Ontology, and NEXIF as Nepomuk Image Ontology
from the beginning, to spare the misunderstandings.

The entire development process started with taking some common standards
and merging them i.e. finding common stuff and expressing those
commonalities via a common parent property/parent class. Those
ontologies covered all the use cases we needed. The nepomuk project had
to keep finite scope. I'm all for discussing new use cases, throwing new
stuff into the mix, and finding new commonalities.

> I was proposing much more detailed ontology previously, but Evgeny 
> convinced me to drop as many properties as possible.
> 
>> It was a well-thought and good decision to start with something
>> * simple
>> * that works
>> we looked at all the alternatives, and if we would have done as you say 
>> (start from scratch) -  would have failed within the small 11mio eur 
>> budget we had. (well, you get the idea...)
>>   
> Well, making a copy of id3 v 2 ontology should take one weekend at most.
>> we have running code now in aperture.sf.net. it works. its fine.
>>   
> 
> 
>> we expected that someone would come later and fix our decision - so YES 
>> - its good - we need to have NMM, but I want to see it working first 
>> before I dump the existing stuff.
>>   
> Sure, we can keep the nid3, I have really nothing against keeping 
> deprecated ontologies in repositories.
>> we are talking about money here to change aperture, it will take some 
>> time of our users to swallow this and I am not going to stand up the 
>> heat of our existing userbase if I am not fully convinced of a new ontology.
>>
>>   
> Well, you can keep the support of the old ontology as well.
>> note: you never showed us NMM, so this time, I can happily flame back 
>> and like to switch to the language I would use between gunnar and me in 
>> our office:
>> show your assets or shut ... .... up.
>>
>> googling for NMM, I find is this draft:
>> http://xesam.org/main/Hackfest2008/nmm
>> is this NMM?
>>   
> Well, this is the current git link:
> http://git.gnome.org/cgit/tracker/tree/data/ontologies/38-nmm.ontology
> 
> 
> 
>> be precise, it has a good reason that the namespace is also the HTTP 
>> address where I can download the ontology using HTTP to validate it.
>> (the draft does not even mention where I can get the ontology via HTTP....)
>>   
> As you have read, the point of that page is to design the ontology. Once 
> we agree that it's final, then we put it to proper location.
>> currently you propose to rewrite aperture.sf.net with an ontology I 
>> can't quite grasp, sorry, but this is not the way you are going to 
>> convince me. This is also why the OSCAF part of our work is here: to 
>> help establish a good documentation of what happens.
>>   
> I'm not proposing to do anything for a specific application. I'm saying 
> that NID3 is a bad ontology and NMM is the way forward. NMM is not ready 
> yet, but it should be polished together with the community.

Could you please write a more detailed explanation of what you think is
"bad" with NID3. I would imagine something along the lines of [1]. And
why is it better to do the same stuff with NMM rather than adding some
classes/properties to NID3.

In my experience it's easier to discuss use cases like
- it's possible/impossible to express this/that
- this/that can be expressed in more than one way which may lead to
inconsistencies
- having data expressed with this ontology it is possible/impossible to
implement this/that functionality
- this/that is a plain typo in the rdfs file with the ontology
- this/that part of the documentation is misleading

It's difficult to discuss aesthetics.

Also note that it will be difficult to keep things stable once we start
adding classes like "Song", "Artist", "Movie" etc. Originally NIE was
about things that can be directly extracted from the bits/bytes without
any heuristics/NLP/image understanding/speech recognition etc.

In Nepomuk this assumption defined the scope of NIE and the border
between NIE and NAO/PIMO. I'm not saying that this dogma is to be upheld
at all cost, but keeping it will keep the development focused. Otherwise
we will have to solve issues whether an avi file with the Madonna's
"Frozen" clip is a movie, or whether an mp3 with the recording of the
Great B-Minor Mass can be called a "Song". At least I'd suggest to keep
"Song" and "AudioTrack" in separate ontologies.

I'll try to take a closer look at the NMM git url you've given and
return with some more focused feedback.

Antoni Mylka
antoni.mylka at gmail.com

[1]
http://www.semanticdesktop.org/ontologies/2007/04/02/ncal/#sec-drawbacks