[XESAM] Ontology snapshot

Wed Jun 6 08:47:03 PDT 2007

On Wednesday 06 June 2007 17:54:13 jamie wrote:
> On Wed, 2007-06-06 at 16:37 +0200, Mikkel Kamstrup Erlandsen wrote:
> > I've been bugging about trying to figure out how we can please
> > everyone with regards to categories and sources.
> >
> > There seem to be consensus on the following: Each object has two
> > designated *single valued* fields Category and Source. These two
> > fields imply what other fields makes sense on the object (as implied
> > by the purple arrows in Evgenys diagram).
> >
> > Important: There is a trade off made here. We basically have two
> > choices to avoid a lot of duplication/ambiguities in the onto: Either
> > we allow multiple inheritance (on categories is all that is needed) or
> > we have multiple values for the category field. I talked this over
> > with Evgeny and we ended up with the multiple-inheritance for cats.
> > The example here could be that a SourceCode cat derives from both
> > TextDocument and Software.
>
> such a scheme screws up our search results by category in tracker
>
> we have search by cat for Development Files and Text Files but we do not
> show Dev files under Text Files. Having a deep hierarchy will also cause
> lots of dupes in search results for different cats
>
> For practical reasons I prefer it as flat as possible
>
> Current tracker onto for File based cats is:
>
> All Files
> -> Music
> -> Documents
> -> Text
> -> Videos
> -> Images
> -> Development
> -> Folders
>
> As you can see there is no need for more than one level deep inheritance
> and absolutely no need for MI. Even if you put Dev files under Text,
> Text still inherits from All Files so a need for MI is not necessary.
>
> Text in tracker does not show Docs or Dev files (even if they are text
> based) as they have their own cat. I really dont like duplicating
> results in different cats

Actually there's a need and many real use cases. This is a specific of your 
approach, but there's no problem with that. 

It is possible to provide a category that would act like you describe for text 
files, but not sourcecode files, especially so with MI for categories(for 
complex cases). 

That is we can have a TextFile with children SourceCode, TextDocument and 
Text(for non sourcecode/document). All text-related properties like line 
count belong to TextFile. This unifies both approaches and on the surface 
seems better since there's in fact a distinction between a plain-text file 
and a plain-text document file, though you can't 100% discern this at the 
software level but you can try.

As to "flatness" of your approach, it's not flat per se.

If you try to formally represent your approach with an ontology, that is 
provide a  consistent and machine-"understandable" description of the rules 
like which files get assigned/imply which properties, how properties are 
related, how files are spread across categories, you will end up with many 
abstract Categories with MI.

And that's exactly what I'm doing. A formal non-ambiguous and 
machine-"understandable" description.

This "flattness" is only possible if you give a generic description of the 
ontology to humans and rely on them figuring out the rest using their 
knowledge of file formats, metadata, common-sense, looking at your source 
code etc and making their software understand the implications.

--Evgeny