[Xesam] dumping NID3/NEXIF for NMM
Evgeny Egorochkin
phreedom.stdin at gmail.com
Wed Jun 17 08:16:52 PDT 2009
On 17 июня 2009 17:35:14 Leo Sauermann wrote:
> Hi guys,
>
> Very short answer from me:
> * Urho agreed that we can maintain nexif+nid3, so we do that
Of course. We can keep these for quite some time and even try remapping them
over any replacements/improvements we come up with.
> * You are discussing about NMM, so NMM is taken up on the "oscaf list of
> ontologies we care for".
ok
> * As the discussion spread over several e-mails and various websites is
> fruitless and uncomprehensible to anyone reading it in a year, I
> recommend to track all arguments in tickets on sourceforge
>
> I created a component and wiki page for NMM, please fill it:
> http://sourceforge.net/apps/trac/oscaf/wiki/NMM
>
> if you are unsure what to write, I just moved PIMO from nepomuk to oscaf:
> http://sourceforge.net/apps/trac/oscaf/wiki/PIMO
>
> please give me your account names on sf.net so that I can add you there.
Mine is "phreedom_"
I looked at SF documentation and discovered that they offer export of Trac
data[1] so we're not locked in and they even offer git[2] now. However import
is not possible(although users are asking for it[3]) so it means we lose
previous discussions :(
[1] http://sourceforge.net/apps/trac/sourceforge/wiki/Backup your data
[2] http://sourceforge.net/apps/trac/sourceforge/wiki/Git
[3]
http://sourceforge.net/tracker/index.php?func=detail&aid=2668717&group_id=1&atid=350001
> It was Urho Konttori who said at the right time 17.06.2009 08:50 the
>
> following words:
> > Hi!
> >
> > I'm not repeating to what Evgeny has commented to
> >
> >>>> we took standards that worked very well for the it industry for the
> >>>> last years, this is the common process when designing an ontology:
> >>>> look for existing standards and copy them.
> >>>
> >>> Which is not always the right choice. This is also obvious in the
> >>> calendar ontology. You guys took the specs of ical too literally. Ivan
> >>> is working on a proposal that drops all of the union classes and
> >>> replaces them with a single superclass, which makes the ncal a nice and
> >>> clean structure. I mean, ncal is for the most part really nice, but the
> >>> drive to copy the ical exactly lead to those horrible union classes.
> >>
> >> My 2c. The road towards the present ncal had two stages
> >>
> >> 1. Dan Connolly and the people from the www-rdf-calendar community at
> >> w3c devised a python+xslt script that generates the icaltzd ontology
> >> from the plain text file of the RFC itself. You can't get any nearer to
> >> the original standard definition.
> >
> > Indeed, but at the same time, not everything must be done pixel perfect.
> > You just need to be able to express as much as ical.
> >
> >> 2. We took the Connolly's ontology and translated it to NRL. First with
> >> an java program, then tweaked the result manually, all of this is
> >> documented in [1]. The union classes appeared as a NRL equivalent of the
> >> owl:unionOf construct present in the original OWL ontology.
> >
> > Well, the automated step is not needed in my opinion anyway. The ical is
> > quite limited standard anyway, so it could have been converted manually
> > just as well. In any way, I do applaud your way of doing it automated.
> >
> >> If you have an idea for a third stage that will make it better, easier,
> >> and still manage to express most of the information from ICAL files
> >> without loss, I personally couldn't agree more.
> >
> > Ivan will make the proposal after we have been running it through with
> > our calendar team a few times so that we know it's right one. As, if
> > nothing else, we have learned that the domain experts must validate each
> > and every ontology.
> >
> >> The most important goal of the union classes was validation, if a
> >> property can appear on an Event, but can't on a Journal entry, spotting
> >> it on a journal entry means we have a bug in the ICAL->NCAL converter.
> >> The conversion process itself is quite error-prone. Every level of
> >> validation was welcome.
> >
> > Well, with superclass, you can still do the validation in post
> > processing just as well.
> >
> >> The union classes weren't considered a problem because the converter
> >> didn't have to generate them, and they never come up in the data seen by
> >> the user (that is the application used by the user). Nobody needs to
> >> write code that 'understands' them. Could you elaborate more on the
> >> problems you have with those union classes apart from them being 'ugly'?
> >
> > What do you mean, nobody needs to write code that 'understands them'?
> > People had serious issues in understanding the reason for their
> > existence overall. Developers need to understand the ontology. This is
> > why the ontology needs to be created in a manner that is logical for a
> > reader.
> >
> >>>> Look, even the W3C people did base their calendar ontology on the
> >>>> vCal standard, do not shoot at this approach, it is a good one.
> >>>> Consider the time it took to make MPEG7 (by the way, you may think
> >>>> about porting that one to an ontology, at least partly, instead of
> >>>> rolling your own) - we did consider this and say: invest time into
> >>>> what we need, not endless standardization discussions that others did
> >>>> before you.
> >>>
> >>> Sure. And mpeg7 is also a very nice example on how nid3 is not a good
> >>> idea.
> >>
> >> Allow me not to agree with that. I've spent two weeks in February 2007
> >> trying to come to grips with MPEG7. My private conclusion was that MPEG7
> >> is a specification bloated beyond anything I'd seen. The initial idea
> >> was to take the MPEG7 ontology developed at DFKI for the SmartWeb
> >> project and adapt it for nepomuk. After two weeks of banging my head
> >> against the wall I dropped it and settled on NEXIF/NID3. The two most
> >> important disadvantages of MPEG7 were
> >
> > I couldn't agree with you more on how bloated mpeg7 is. The point that I
> > apparently forgot to type down was, that it's again a completely
> > different standard, which is fundamentally different than nid3. Now, if
> > you would again copy mpeg7 as a new ontology to nepomuk, imagine trying
> > to create somewhat sane queries that are accessing and combining results
> > from both ontologies. This is why we need to have an abstraction
> > ontology.
> >
> >> - many intermediary rdf nodes needed to express a simple thing. E.g.
> >> saying that mp3 file has a composer whose name is "Smith" took something
> >> around 5 or 6 rdf triples, whereas in NID3 it takes two.
> >> - trying to cover all levels of abstraction, from basic technical
> >> metadata to the semantic meaning like "this part of a picture is a face
> >> of a person". If you're in for simplicity (which seems to be the case)
> >> mpeg7 would be a very bad choice. Please correct me if I'm wrong.
> >
> > Nope, you are to-the-point correct.
> >
> >>> Ok, let me put it this way, we can support tens of different 'copy'
> >>> ontologies, as long as we also provide an abstraction ontology that
> >>> makes the use of the data easy. Think about it. mpeg7 will have totally
> >>> different names for the same fields that are in nid3, ogg ontology
> >>> would also have different names and so forth. Now, for a media player,
> >>> you have additional libraries (e.g. gstreamer) that handle the hassle
> >>> of playback for you. You don't need to care about the format at all.
> >>> Now, when you are showing the available music on an application window,
> >>> you don't want to query for the metadata from different ontologies. You
> >>> want one, that combines the most common features of various audio file
> >>> types, various video file types and various image file types.
> >>
> >> The basic idea was to use the nid3 ontology for all audio metadata. It
> >> seems that the 'ID3' in the ontology name may be misleading. My
> >> intention was to extend it with properties beyond the id3 standard if
> >> need be. There was supposed to be no 'ogg' ontology, if ogg files
> >> contain metadata fields that have no direct mappings in id3 - the nid3
> >> should be extended. Perhaps we should have named it something along the
> >> lines of Nepomuk Audio Ontology, and NEXIF as Nepomuk Image Ontology
> >> from the beginning, to spare the misunderstandings.
> >
> > NEXIF is a copy of exif. Exif is not interesting semantically for
> > anything else but the geo cordinates, flash on, flash off, scene type,
> > make, lightsource, and the authoring metadata (that should anyway be in
> > nie, not in nexif). The rest, while interesting for a photo application
> > at the time of viewing the image, really, most of the time are not
> > interesting.
> >
> >> The entire development process started with taking some common standards
> >> and merging them i.e. finding common stuff and expressing those
> >> commonalities via a common parent property/parent class. Those
> >> ontologies covered all the use cases we needed. The nepomuk project had
> >> to keep finite scope. I'm all for discussing new use cases, throwing new
> >> stuff into the mix, and finding new commonalities.
> >
> > So, what other standards did you use to create nexif other than exif?
> >
> > Anyway, I really am not against nexif. It's a nice copy of the exif, and
> > might actually be useful for some photographers. However, with XMP, DC,
> > IPTC over XMP, I really feel that we need to have the common elements
> > again in a abtract ontology.
> >
> > If you do read the hierarchy of how the properties in files should be
> > superceeding each other in those standards, it becomes quite quickly
> > obvious that the abstraction that handles the hierarchy of value
> > overriding is really needed.
> >
> > <snip>
> >
> >> Could you please write a more detailed explanation of what you think is
> >> "bad" with NID3. I would imagine something along the lines of [1]. And
> >> why is it better to do the same stuff with NMM rather than adding some
> >> classes/properties to NID3.
> >
> > Evgeny replied to this section, so I'm hoping we continue the discussion
> > of this in that thread.
More information about the Xesam
mailing list