[Xesam] dumping NID3/NEXIF for NMM

Urho Konttori urho.konttori at nokia.com
Tue Jun 16 23:50:34 PDT 2009


Hi!

I'm not repeating to what Evgeny has commented to

>>     
>>> we took standards that worked very well for the it industry for the last
>>> years, this is the common process when designing an ontology: look for
>>> existing standards and copy them.
>>>
>>>       
>> Which is not always the right choice. This is also obvious in the
>> calendar ontology. You guys took the specs of ical too literally. Ivan
>> is working on a proposal that drops all of the union classes and
>> replaces them with a single superclass, which makes the ncal a nice and
>> clean structure. I mean, ncal is for the most part really nice, but the
>> drive to copy the ical exactly lead to those horrible union classes.
>>
>>     
>
> My 2c. The road towards the present ncal had two stages
>
> 1. Dan Connolly and the people from the www-rdf-calendar community at
> w3c devised a python+xslt script that generates the icaltzd ontology
> from the plain text file of the RFC itself. You can't get any nearer to
> the original standard definition.
>   
Indeed, but at the same time, not everything must be done pixel perfect. 
You just need to be able to express as much as ical.
> 2. We took the Connolly's ontology and translated it to NRL. First with
> an java program, then tweaked the result manually, all of this is
> documented in [1]. The union classes appeared as a NRL equivalent of the
> owl:unionOf construct present in the original OWL ontology.
>   
Well, the automated step is not needed in my opinion anyway. The ical is 
quite limited standard anyway, so it could have been converted manually 
just as well. In any way, I do applaud your way of doing it automated.


> If you have an idea for a third stage that will make it better, easier,
> and still manage to express most of the information from ICAL files
> without loss, I personally couldn't agree more.
>   
Ivan will make the proposal after we have been running it through with 
our calendar team a few times so that we know it's right one. As, if 
nothing else, we have learned that the domain experts must validate each 
and every ontology.

> The most important goal of the union classes was validation, if a
> property can appear on an Event, but can't on a Journal entry, spotting
> it on a journal entry means we have a bug in the ICAL->NCAL converter.
> The conversion process itself is quite error-prone. Every level of
> validation was welcome.
>   
Well, with superclass, you can still do the validation in post 
processing just as well.
> The union classes weren't considered a problem because the converter
> didn't have to generate them, and they never come up in the data seen by
> the user (that is the application used by the user). Nobody needs to
> write code that 'understands' them. Could you elaborate more on the
> problems you have with those union classes apart from them being 'ugly'?
>   
What do you mean, nobody needs to write code that 'understands them'? 
People had serious issues in understanding the reason for their 
existence overall. Developers need to understand the ontology. This is 
why the ontology needs to be created in a manner that is logical for a 
reader.


>>> Look, even the W3C people did  base their calendar ontology on the vCal
>>> standard, do not shoot at this approach, it is a good one. Consider the
>>> time it took to make MPEG7 (by the way, you may think about porting that
>>> one to an ontology, at least partly, instead of rolling your own) - we
>>> did consider this and say: invest time into what we need, not endless
>>> standardization discussions that others did before you.
>>>
>>>       
>> Sure. And mpeg7 is also a very nice example on how nid3 is not a good idea.
>>     
>
> Allow me not to agree with that. I've spent two weeks in February 2007
> trying to come to grips with MPEG7. My private conclusion was that MPEG7
> is a specification bloated beyond anything I'd seen. The initial idea
> was to take the MPEG7 ontology developed at DFKI for the SmartWeb
> project and adapt it for nepomuk. After two weeks of banging my head
> against the wall I dropped it and settled on NEXIF/NID3. The two most
> important disadvantages of MPEG7 were
>   
I couldn't agree with you more on how bloated mpeg7 is. The point that I 
apparently forgot to type down was, that it's again a completely 
different standard, which is fundamentally different than nid3. Now, if 
you would again copy mpeg7 as a new ontology to nepomuk, imagine trying 
to create somewhat sane queries that are accessing and combining results 
from both ontologies. This is why we need to have an abstraction ontology.
> - many intermediary rdf nodes needed to express a simple thing. E.g.
> saying that mp3 file has a composer whose name is "Smith" took something
> around 5 or 6 rdf triples, whereas in NID3 it takes two.
> - trying to cover all levels of abstraction, from basic technical
> metadata to the semantic meaning like "this part of a picture is a face
> of a person". If you're in for simplicity (which seems to be the case)
> mpeg7 would be a very bad choice. Please correct me if I'm wrong.
>   
Nope, you are to-the-point correct.
>   
>> Ok, let me put it this way, we can support tens of different 'copy'
>> ontologies, as long as we also provide an abstraction ontology that
>> makes the use of the data easy. Think about it. mpeg7 will have totally
>> different names for the same fields that are in nid3, ogg ontology would
>> also have different names and so forth. Now, for a media player, you
>> have additional libraries (e.g. gstreamer) that handle the hassle of
>> playback for you. You don't need to care about the format at all. Now,
>> when you are showing the available music on an application window, you
>> don't want to query for the metadata from different ontologies. You want
>> one, that combines the most common features of various audio file types,
>> various video file types and various image file types.
>>     
>
> The basic idea was to use the nid3 ontology for all audio metadata. It
> seems that the 'ID3' in the ontology name may be misleading. My
> intention was to extend it with properties beyond the id3 standard if
> need be. There was supposed to be no 'ogg' ontology, if ogg files
> contain metadata fields that have no direct mappings in id3 - the nid3
> should be extended. Perhaps we should have named it something along the
> lines of Nepomuk Audio Ontology, and NEXIF as Nepomuk Image Ontology
> from the beginning, to spare the misunderstandings.
>   
NEXIF is a copy of exif. Exif is not interesting semantically for 
anything else but the geo cordinates, flash on, flash off, scene type, 
make, lightsource, and the authoring metadata (that should anyway be in 
nie, not in nexif). The rest, while interesting for a photo application 
at the time of viewing the image, really, most of the time are not 
interesting.


> The entire development process started with taking some common standards
> and merging them i.e. finding common stuff and expressing those
> commonalities via a common parent property/parent class. Those
> ontologies covered all the use cases we needed. The nepomuk project had
> to keep finite scope. I'm all for discussing new use cases, throwing new
> stuff into the mix, and finding new commonalities.
>   
So, what other standards did you use to create nexif other than exif?

Anyway, I really am not against nexif. It's a nice copy of the exif, and 
might actually be useful for some photographers. However, with XMP, DC, 
IPTC over XMP, I really feel that we need to have the common elements 
again in a abtract ontology.

If you do read the hierarchy of how the properties in files should be 
superceeding each other in those standards, it becomes quite quickly 
obvious that the abstraction that handles the hierarchy of value 
overriding is really needed.

<snip>
>   
> Could you please write a more detailed explanation of what you think is
> "bad" with NID3. I would imagine something along the lines of [1]. And
> why is it better to do the same stuff with NMM rather than adding some
> classes/properties to NID3.
>   

Evgeny replied to this section, so I'm hoping we continue the discussion 
of this in that thread.



Kind regards,
Urho


More information about the Xesam mailing list