[Xesam] abstract properties?

Mon Jun 23 15:15:12 PDT 2008

2008/2/17 Evgeny Egorochkin <phreedom.stdin at gmail.com>:
> В сообщении от Sunday 17 February 2008 01:28:49 Mikkel Kamstrup Erlandsen
> написал(а):
>> On 14/02/2008, Sebastian Trüg <strueg at mandriva.com> wrote:
>> > On Thursday 14 February 2008 10:19:30 Mikkel Kamstrup Erlandsen wrote:
>> > > On 14/02/2008, Evgeny Egorochkin <phreedom.stdin at gmail.com> wrote:
>> > > > Hi guys,
>> > > >
>> > > >  This is in response to the lengthy discussion on #xesam that
>> > > > happened while I
>> > > >
>> > > >  was sleeping:
>> > > >  >(22:35:51)  kamstrup:  in other words a field is abstract if and
>> > > >  > only if it
>> > > >
>> > > >  has children
>> > > >
>> > > >  >(22:36:17)  jamiemcc:  yes and is not used in searches
>> > > >  >(22:36:30)  kamstrup:  also meaning that third parties can not
>> > > >  > extend fields
>> > > >
>> > > >  which does not have any children in the Xesam onto
>> > > >
>> > > >  >(22:36:45)  kamstrup:  moreover I also think we agreed that you can
>> > > >  > not
>> > > >
>> > > >  assign any value to an abstract field
>> > > >
>> > > >  >(22:36:54)  kamstrup:  (maybe obvious)
>> > > >  >(22:36:57)  jamiemcc:  yes
>> > > >  >(22:37:12)  kamstrup:  good, I think we agree then
>> > > >  >(22:37:15)  jamiemcc:  abstract are like intermediate classes
>> > > >  >(22:37:22)  kamstrup:  yes
>> > > >  >(22:37:31)  jamiemcc:  they ar enot used directly but instead are
>> > > >  > always
>> > > >
>> > > >  inherited from
>> > > >
>> > > >  >(22:37:36)  kamstrup:  only leaf nodes of the onto can contain
>> > > >  > values
>> > > >
>> > > >  The benefits of this approach:
>> > > >  >(22:54:46)  kamstrup:  and having this as a restriction in Xesam
>> > > >  > does not render us incompatible with Nepo
>> > > >
>> > > >  This renders xesam incompatible with most if not any rdfs based
>> > > > approaches. xesam->rdfs_derivative mapping is ok but it breaks in the
>> > > > opposite direction.
>> > > >
>> > > >  >(23:37:07)  kamstrup:  but it is even more likely that there are
>> > > >  > two good
>> > > >
>> > > >  contradictory arguments
>> > > >
>> > > >  >(23:37:44)  kamstrup:  my primary arg is simplicity
>> > > >
>> > > >  Certainly not the simplicity of the ontology and not the
>> > > > simplicity(and feasibility) of onto extensions.
>> > >
>> > > FIRSTLY: let me make it clear that I am not anal about any of these
>> > > issues. I am prepared to change my mind given convincing arguments.
>> > >
>> > > SECOND: There are really two issues at hand here.
>> > >  1) Can fields with children have values?
>> > >  2) Can an item belong to a category that has children?
>> > >
>> > > To the case in point:
>> > >
>> > > I guess the beauty is in the eye of the beholder. I have two design
>> > > analogies to present to you:
>> > >
>> > >  * Unix file system. You can not put data in a directory. You can put
>> > > files in a dir, and data in files. That is it. It is proven to be a
>> > > good architecture.
>> >
>> > Not that a good example as it is a very special and very restricted case.
>> >
>> > >  * Object Oriented Code. Java interfaces provides clear cut
>> > > abstractions. Allowing abstract fields to have values is like
>> > > programming only with normal classes and doing derivation on these.
>> > >
>> > >  Programming with Java interfaces will not give you fewer .class
>> > > files, but the program will be simpler to grok as a code base. Other
>> > > hackers can easily pinpoint the cases of abstraction, and if the
>> > > interfaces a properly designed then your program is more easily
>> > > maintainable.
>> > >
>> > > ( * Mime types? You can be "image/jpg", but can you be an "image"? I
>> > > guess not.)
>> >
>> > AFAIK, mimetypes live in a hierarchy. Meaning text/plain has a whole
>> > bunch of subtypes such as c code or whatever and you can still have plain
>> > text files. Or application/xsd is a subtype of application/xml to name a
>> > proper example. And still files (or chunks of data) can be only
>> > application/xml.
>>
>> I think you (without knowing) gave me one of the most compelling
>> arguments to go for the structure you describe.
>>
>> According to the shared mime spec
>> (http://standards.freedesktop.org/shared-mime-info-spec/shared-mime-info-sp
>>ec-latest.html#subclassing) mime types refer strictly to the format of the
>> files, not the content. The type system in the Xesam onto models the
>> *content* of the files. In this sense mimetypes and Xesam content types are
>> dual to each
>> other.
>>
>> I think it would be beautiful if we could keep our content model dual
>> too the mime tree. I don't know if this can be done cleanly, but I
>> think it can.
>
> Not so sure, but will take a closer look.
>
>> As far as I can see the shared mime system has the
>> following traits:
>>
>>  * Multiple inheritance for types (check, we have that too)
>>
>>  * A file can be an instance of any node in the tree (with an
>> "exception" - see next point)
>>
>>  * A file has exactly one mime type assigned (we have one content type
>> per object too)
>>
>>  * "Root nodes" are not part of the mime tree. Meaning that "text",
>> "image", "application", etc are not valid mimetypes. Hence you can not
>> be just and "image" (which makes sense since an image file must have a
>> format).
>
>> > > >  >(23:38:03)  kamstrup:  query expansion will also be easier
>> > > >
>> > > >  So your argument is that having to expand (grandparent=value) into
>> > > >  ((grandparent=value) or (parent=value) or (property=value)) if we
>> > > > get rid of "abstract" properties somehow leads to too much
>> > > > complexity?
>> > >
>> > > There can be multiple levels of abstract fields mind you. If you
>> > > expand xesam:related you get:
>> > >
>> > > With abstract fields(9): related, conflicts, depends, contains, knows,
>> > > links, derivedFrom, inReplyTo, supercedes
>> > >
>> > > Without(5): conflicts, contains, knows, inReplyTo, supercedes
>> > >
>> > > Ofcourse if we use the design I agitate for, then the last case would
>> > > have more fields.
>> > >
>> > > >  Query expansion will necessitate property tree traversal in both
>> > > > cases. The only difference is what fields are included in the
>> > > > resulting list: all or only non-abstract.
>> > > >
>> > > >  The DB engine will discard those grandparent and parent criterias as
>> > > > soon as it sees that appropriate tables are empty. No performance
>> > > > overhead here either.
>> > >
>> > > I am not a database expert, so I don't know much here. I would just
>> > > assume that you could more easily optimize your db schema if you know
>> > > what fields can be extended in the future. In your model all fields
>> > > can be extended in any way.
>> >
>> > That is the beauty of it and the reason why it is so powerful. All kinds
>> > of data can be combined. And if you extend a type you stay backwards
>> > compatible since old clients will still see the new type as a subtype of
>> > the old type and can handle it accordingly. I don't see a problem. Only
>> > advantages.
>>
>> The "one type of node"-idea works good for our content classification
>> system, but not so well for the fields.
>>
>> Field derivations come in roughly two shapes. The Grouping and the
>> Specialization.
>>
>> A grouping field is one such as xesam:legal. It has child fields with
>> very clear meanings (xesam:license,copyright,...), but is it self
>> vague at best. By my standards it is directly dangerous if you set or
>> get data from a grouping field - you basically wont know what is in
>> there.
>
> Probably xesam:legal is too ambiguous and should have been named something
> like xesam:legalNotice. In fact this is about specialization too.
>
> Say if we had some legal notice, but there's no way to parse it, so we put it
> into xesam:legalNotice, the most specific place possible. However if we can
> extract specific bits related to copyright, license type etc, we put that
> information into more specific fields like xesam:copyright,
> xesam:licenseType. We could go even further, split xesam:copyright into
> xesam:copyrightHolder, xesam:copyrightYear and assign these fields if we can
> actually extract such level of detail.
>
> The trick here is that if we put all metadata as a plain-text comment for a
> file, only a human will be able to understand anything. If we put the info
> into more specific places, like title, comment etc computers can also
> somewhat better understand what's going on. This is what's called extracting
> semantics. Same for legalNotice, we always parse and represent it with the
> finest detail possible. Sometimes this means just a bunch of text that's
> vaguely related to legal matters. Sometimes we can understand what every
> character of the notice means.
>
>> A specialization is like xesam:primaryRecipent with the child
>> xesam:to. Here you could put stuff in primaryRecipent and it would
>> still make a lot of sense.
>>
>> My problem is that I would really like only one type of derivation, as
>> this makes my mental model simpler.
>

Consider this thread bumped. It is on our list of blockers and we need
a clear decision. Thread root is at:
http://lists.freedesktop.org/archives/xesam/2008-February/000098.html

Please recall that we have to cases where "abstractness" can apply:
Fields and Categories. They are highly related but not exactly the
same topic.

I am in favor of dropping the "abstract field" concept. Here are my reasons:

 * To this day it has done nothing but confuse users. Several people
have inquired me about this, either asking about them, or clearly
being in the dark about their application and existence. I take this
extremely seriously. I really really don't want to confuse developers
more than necessary

 * It is easier to keep the ontology forwards and backwards compatible
if we don't shackle our selves

 * It will be impossible for 3rd parties to extend a field that is not abstract

 * Should in fact make query expansion easier

I am also in favor of dropping abstract categories:

 * Same points as above

 * Consistency of the ontology. It would be weird to have abstract
cats but not abstract fields

Cheers,
Mikkel