[Xesam] abstract properties?

Sat Feb 16 15:28:49 PST 2008

On 14/02/2008, Sebastian Trüg <strueg at mandriva.com> wrote:
> On Thursday 14 February 2008 10:19:30 Mikkel Kamstrup Erlandsen wrote:
> > On 14/02/2008, Evgeny Egorochkin <phreedom.stdin at gmail.com> wrote:
> > > Hi guys,
> > >
> > >  This is in response to the lengthy discussion on #xesam that happened
> > > while I
> > >
> > >  was sleeping:
> > >  >(22:35:51)  kamstrup:  in other words a field is abstract if and only
> > >  > if it
> > >
> > >  has children
> > >
> > >  >(22:36:17)  jamiemcc:  yes and is not used in searches
> > >  >(22:36:30)  kamstrup:  also meaning that third parties can not extend
> > >  > fields
> > >
> > >  which does not have any children in the Xesam onto
> > >
> > >  >(22:36:45)  kamstrup:  moreover I also think we agreed that you can not
> > >
> > >  assign any value to an abstract field
> > >
> > >  >(22:36:54)  kamstrup:  (maybe obvious)
> > >  >(22:36:57)  jamiemcc:  yes
> > >  >(22:37:12)  kamstrup:  good, I think we agree then
> > >  >(22:37:15)  jamiemcc:  abstract are like intermediate classes
> > >  >(22:37:22)  kamstrup:  yes
> > >  >(22:37:31)  jamiemcc:  they ar enot used directly but instead are
> > >  > always
> > >
> > >  inherited from
> > >
> > >  >(22:37:36)  kamstrup:  only leaf nodes of the onto can contain values
> > >
> > >  The benefits of this approach:
> > >  >(22:54:46)  kamstrup:  and having this as a restriction in Xesam does
> > >  > not render us incompatible with Nepo
> > >
> > >  This renders xesam incompatible with most if not any rdfs based
> > > approaches. xesam->rdfs_derivative mapping is ok but it breaks in the
> > > opposite direction.
> > >
> > >  >(23:37:07)  kamstrup:  but it is even more likely that there are two
> > >  > good
> > >
> > >  contradictory arguments
> > >
> > >  >(23:37:44)  kamstrup:  my primary arg is simplicity
> > >
> > >  Certainly not the simplicity of the ontology and not the simplicity(and
> > >  feasibility) of onto extensions.
> >
> > FIRSTLY: let me make it clear that I am not anal about any of these
> > issues. I am prepared to change my mind given convincing arguments.
> >
> > SECOND: There are really two issues at hand here.
> >  1) Can fields with children have values?
> >  2) Can an item belong to a category that has children?
> >
> > To the case in point:
> >
> > I guess the beauty is in the eye of the beholder. I have two design
> > analogies to present to you:
> >
> >  * Unix file system. You can not put data in a directory. You can put
> > files in a dir, and data in files. That is it. It is proven to be a
> > good architecture.
>
> Not that a good example as it is a very special and very restricted case.
>
> >  * Object Oriented Code. Java interfaces provides clear cut
> > abstractions. Allowing abstract fields to have values is like
> > programming only with normal classes and doing derivation on these.
> >
> >  Programming with Java interfaces will not give you fewer .class
> > files, but the program will be simpler to grok as a code base. Other
> > hackers can easily pinpoint the cases of abstraction, and if the
> > interfaces a properly designed then your program is more easily
> > maintainable.
> >
> > ( * Mime types? You can be "image/jpg", but can you be an "image"? I guess
> > not.)
>
> AFAIK, mimetypes live in a hierarchy. Meaning text/plain has a whole bunch of
> subtypes such as c code or whatever and you can still have plain text files.
> Or application/xsd is a subtype of application/xml to name a proper example.
> And still files (or chunks of data) can be only application/xml.

I think you (without knowing) gave me one of the most compelling
arguments to go for the structure you describe.

According to the shared mime spec
(http://standards.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-latest.html#subclassing)
mime types refer strictly to the format of the files, not the content.
The type system in the Xesam onto models the *content* of the files.
In this sense mimetypes and Xesam content types are dual to each
other.

I think it would be beautiful if we could keep our content model dual
too the mime tree. I don't know if this can be done cleanly, but I
think it can. As far as I can see the shared mime system has the
following traits:

 * Multiple inheritance for types (check, we have that too)

 * A file can be an instance of any node in the tree (with an
"exception" - see next point)

 * A file has exactly one mime type assigned (we have one content type
per object too)

 * "Root nodes" are not part of the mime tree. Meaning that "text",
"image", "application", etc are not valid mimetypes. Hence you can not
be just and "image" (which makes sense since an image file must have a
format).

> > >  >(23:38:03)  kamstrup:  query expansion will also be easier
> > >
> > >  So your argument is that having to expand (grandparent=value) into
> > >  ((grandparent=value) or (parent=value) or (property=value)) if we get
> > > rid of "abstract" properties somehow leads to too much complexity?
> >
> > There can be multiple levels of abstract fields mind you. If you
> > expand xesam:related you get:
> >
> > With abstract fields(9): related, conflicts, depends, contains, knows,
> > links, derivedFrom, inReplyTo, supercedes
> >
> > Without(5): conflicts, contains, knows, inReplyTo, supercedes
> >
> > Ofcourse if we use the design I agitate for, then the last case would
> > have more fields.
> >
> > >  Query expansion will necessitate property tree traversal in both cases.
> > > The only difference is what fields are included in the resulting list:
> > > all or only non-abstract.
> > >
> > >  The DB engine will discard those grandparent and parent criterias as
> > > soon as it sees that appropriate tables are empty. No performance
> > > overhead here either.
> >
> > I am not a database expert, so I don't know much here. I would just
> > assume that you could more easily optimize your db schema if you know
> > what fields can be extended in the future. In your model all fields
> > can be extended in any way.
>
> That is the beauty of it and the reason why it is so powerful. All kinds of
> data can be combined. And if you extend a type you stay backwards compatible
> since old clients will still see the new type as a subtype of the old type
> and can handle it accordingly. I don't see a problem. Only advantages.
>

The "one type of node"-idea works good for our content classification
system, but not so well for the fields.

Field derivations come in roughly two shapes. The Grouping and the
Specialization.

A grouping field is one such as xesam:legal. It has child fields with
very clear meanings (xesam:license,copyright,...), but is it self
vague at best. By my standards it is directly dangerous if you set or
get data from a grouping field - you basically wont know what is in
there.

A specialization is like xesam:primaryRecipent with the child
xesam:to. Here you could put stuff in primaryRecipent and it would
still make a lot of sense.

My problem is that I would really like only one type of derivation, as
this makes my mental model simpler.

Cheers,
Mikkel