[Xesam] dumping NID3/NEXIF for NMM
Leo Sauermann
leo.sauermann at dfki.de
Thu Jun 18 00:37:16 PDT 2009
It was Evgeny Egorochkin who said at the right time 17.06.2009 17:16 the
following words:
>
>
> Mine is "phreedom_"
>
ok, added you,
gave you admin rights for TRAC, so that you can tweak around a bit for
importing data, which, as you said, is not automated :-|
best
Leo
> I looked at SF documentation and discovered that they offer export of Trac
> data[1] so we're not locked in and they even offer git[2] now. However import
> is not possible(although users are asking for it[3]) so it means we lose
> previous discussions :(
>
> [1] http://sourceforge.net/apps/trac/sourceforge/wiki/Backup your data
> [2] http://sourceforge.net/apps/trac/sourceforge/wiki/Git
> [3]
> http://sourceforge.net/tracker/index.php?func=detail&aid=2668717&group_id=1&atid=350001
>
>
>> It was Urho Konttori who said at the right time 17.06.2009 08:50 the
>>
>> following words:
>>
>>> Hi!
>>>
>>> I'm not repeating to what Evgeny has commented to
>>>
>>>
>>>>>> we took standards that worked very well for the it industry for the
>>>>>> last years, this is the common process when designing an ontology:
>>>>>> look for existing standards and copy them.
>>>>>>
>>>>> Which is not always the right choice. This is also obvious in the
>>>>> calendar ontology. You guys took the specs of ical too literally. Ivan
>>>>> is working on a proposal that drops all of the union classes and
>>>>> replaces them with a single superclass, which makes the ncal a nice and
>>>>> clean structure. I mean, ncal is for the most part really nice, but the
>>>>> drive to copy the ical exactly lead to those horrible union classes.
>>>>>
>>>> My 2c. The road towards the present ncal had two stages
>>>>
>>>> 1. Dan Connolly and the people from the www-rdf-calendar community at
>>>> w3c devised a python+xslt script that generates the icaltzd ontology
>>>> from the plain text file of the RFC itself. You can't get any nearer to
>>>> the original standard definition.
>>>>
>>> Indeed, but at the same time, not everything must be done pixel perfect.
>>> You just need to be able to express as much as ical.
>>>
>>>
>>>> 2. We took the Connolly's ontology and translated it to NRL. First with
>>>> an java program, then tweaked the result manually, all of this is
>>>> documented in [1]. The union classes appeared as a NRL equivalent of the
>>>> owl:unionOf construct present in the original OWL ontology.
>>>>
>>> Well, the automated step is not needed in my opinion anyway. The ical is
>>> quite limited standard anyway, so it could have been converted manually
>>> just as well. In any way, I do applaud your way of doing it automated.
>>>
>>>
>>>> If you have an idea for a third stage that will make it better, easier,
>>>> and still manage to express most of the information from ICAL files
>>>> without loss, I personally couldn't agree more.
>>>>
>>> Ivan will make the proposal after we have been running it through with
>>> our calendar team a few times so that we know it's right one. As, if
>>> nothing else, we have learned that the domain experts must validate each
>>> and every ontology.
>>>
>>>
>>>> The most important goal of the union classes was validation, if a
>>>> property can appear on an Event, but can't on a Journal entry, spotting
>>>> it on a journal entry means we have a bug in the ICAL->NCAL converter.
>>>> The conversion process itself is quite error-prone. Every level of
>>>> validation was welcome.
>>>>
>>> Well, with superclass, you can still do the validation in post
>>> processing just as well.
>>>
>>>
>>>> The union classes weren't considered a problem because the converter
>>>> didn't have to generate them, and they never come up in the data seen by
>>>> the user (that is the application used by the user). Nobody needs to
>>>> write code that 'understands' them. Could you elaborate more on the
>>>> problems you have with those union classes apart from them being 'ugly'?
>>>>
>>> What do you mean, nobody needs to write code that 'understands them'?
>>> People had serious issues in understanding the reason for their
>>> existence overall. Developers need to understand the ontology. This is
>>> why the ontology needs to be created in a manner that is logical for a
>>> reader.
>>>
>>>
>>>>>> Look, even the W3C people did base their calendar ontology on the
>>>>>> vCal standard, do not shoot at this approach, it is a good one.
>>>>>> Consider the time it took to make MPEG7 (by the way, you may think
>>>>>> about porting that one to an ontology, at least partly, instead of
>>>>>> rolling your own) - we did consider this and say: invest time into
>>>>>> what we need, not endless standardization discussions that others did
>>>>>> before you.
>>>>>>
>>>>> Sure. And mpeg7 is also a very nice example on how nid3 is not a good
>>>>> idea.
>>>>>
>>>> Allow me not to agree with that. I've spent two weeks in February 2007
>>>> trying to come to grips with MPEG7. My private conclusion was that MPEG7
>>>> is a specification bloated beyond anything I'd seen. The initial idea
>>>> was to take the MPEG7 ontology developed at DFKI for the SmartWeb
>>>> project and adapt it for nepomuk. After two weeks of banging my head
>>>> against the wall I dropped it and settled on NEXIF/NID3. The two most
>>>> important disadvantages of MPEG7 were
>>>>
>>> I couldn't agree with you more on how bloated mpeg7 is. The point that I
>>> apparently forgot to type down was, that it's again a completely
>>> different standard, which is fundamentally different than nid3. Now, if
>>> you would again copy mpeg7 as a new ontology to nepomuk, imagine trying
>>> to create somewhat sane queries that are accessing and combining results
>>> from both ontologies. This is why we need to have an abstraction
>>> ontology.
>>>
>>>
>>>> - many intermediary rdf nodes needed to express a simple thing. E.g.
>>>> saying that mp3 file has a composer whose name is "Smith" took something
>>>> around 5 or 6 rdf triples, whereas in NID3 it takes two.
>>>> - trying to cover all levels of abstraction, from basic technical
>>>> metadata to the semantic meaning like "this part of a picture is a face
>>>> of a person". If you're in for simplicity (which seems to be the case)
>>>> mpeg7 would be a very bad choice. Please correct me if I'm wrong.
>>>>
>>> Nope, you are to-the-point correct.
>>>
>>>
>>>>> Ok, let me put it this way, we can support tens of different 'copy'
>>>>> ontologies, as long as we also provide an abstraction ontology that
>>>>> makes the use of the data easy. Think about it. mpeg7 will have totally
>>>>> different names for the same fields that are in nid3, ogg ontology
>>>>> would also have different names and so forth. Now, for a media player,
>>>>> you have additional libraries (e.g. gstreamer) that handle the hassle
>>>>> of playback for you. You don't need to care about the format at all.
>>>>> Now, when you are showing the available music on an application window,
>>>>> you don't want to query for the metadata from different ontologies. You
>>>>> want one, that combines the most common features of various audio file
>>>>> types, various video file types and various image file types.
>>>>>
>>>> The basic idea was to use the nid3 ontology for all audio metadata. It
>>>> seems that the 'ID3' in the ontology name may be misleading. My
>>>> intention was to extend it with properties beyond the id3 standard if
>>>> need be. There was supposed to be no 'ogg' ontology, if ogg files
>>>> contain metadata fields that have no direct mappings in id3 - the nid3
>>>> should be extended. Perhaps we should have named it something along the
>>>> lines of Nepomuk Audio Ontology, and NEXIF as Nepomuk Image Ontology
>>>> from the beginning, to spare the misunderstandings.
>>>>
>>> NEXIF is a copy of exif. Exif is not interesting semantically for
>>> anything else but the geo cordinates, flash on, flash off, scene type,
>>> make, lightsource, and the authoring metadata (that should anyway be in
>>> nie, not in nexif). The rest, while interesting for a photo application
>>> at the time of viewing the image, really, most of the time are not
>>> interesting.
>>>
>>>
>>>> The entire development process started with taking some common standards
>>>> and merging them i.e. finding common stuff and expressing those
>>>> commonalities via a common parent property/parent class. Those
>>>> ontologies covered all the use cases we needed. The nepomuk project had
>>>> to keep finite scope. I'm all for discussing new use cases, throwing new
>>>> stuff into the mix, and finding new commonalities.
>>>>
>>> So, what other standards did you use to create nexif other than exif?
>>>
>>> Anyway, I really am not against nexif. It's a nice copy of the exif, and
>>> might actually be useful for some photographers. However, with XMP, DC,
>>> IPTC over XMP, I really feel that we need to have the common elements
>>> again in a abtract ontology.
>>>
>>> If you do read the hierarchy of how the properties in files should be
>>> superceeding each other in those standards, it becomes quite quickly
>>> obvious that the abstraction that handles the hierarchy of value
>>> overriding is really needed.
>>>
>>> <snip>
>>>
>>>
>>>> Could you please write a more detailed explanation of what you think is
>>>> "bad" with NID3. I would imagine something along the lines of [1]. And
>>>> why is it better to do the same stuff with NMM rather than adding some
>>>> classes/properties to NID3.
>>>>
>>> Evgeny replied to this section, so I'm hoping we continue the discussion
>>> of this in that thread.
>>>
>
>
>
--
____________________________________________________
DI Leo Sauermann http://www.dfki.de/~sauermann
Deutsches Forschungszentrum fuer
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080 Fon: +49 631 20575-116
D-67663 Kaiserslautern Fax: +49 631 20575-102
Germany Mail: leo.sauermann at dfki.de
Geschaeftsfuehrung:
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
____________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xesam/attachments/20090618/400907ae/attachment.html
More information about the Xesam
mailing list