[Xesam] Ontology snapshot

Sun Jun 10 06:23:07 PDT 2007

On Sunday 10 June 2007 12:20:48 Mikkel Kamstrup Erlandsen wrote:
> 2007/6/9, Evgeny Egorochkin <phreedom.stdin at gmail.com>:
> > Source attached. Cute picture:
> >
> > http://www.freedesktop.org/wiki/PhreedomDraft?action=AttachFile&do=view&t
> >arget=viz.png
>
> Great work. This is really starting to look like something.
>
> --------------
>
> > Design decisions proposed:
> >
> > Split ontology into Xesam Core, Xesam Convenience and Xesam Mappings
> >
> > Xesam Core expresses the full semantics of the ontology i.e. it is
> > self-sufficient and describes all the useful information we plan
> > indexing.
> >
> > Xesam Convenience contains semantically irrelevant fields which are
> > subchildren of Xesam Core fields and provide nothing except more
> > human-friendly names and descriptions.
> >
> > Xesam Mappings provides mapping for external standards like EXIF and
> > vCard.
> > For each such standard a base set of fields and categories capturing the
> > most
> > relevant features of the standard is provided in Xesam Core.
> > The full standard or a more complete implementation is provided via Xesam
> > Mappings.
> > The reason: excessive complexity or multiple irrelevant features of the
> > standard.
> >
> > Xesam Core is the primary goal for now. The rest will follow as the need
> > arises/time allows
>
> Ok, I think this (onto split proposal) is a good idea to avoid *trying* to
> create the all-encompassing onto in the first take.
>
> My only gripe is that I don't like the word "Convenience", how about
> "Extended" instead?
>
> So +1 from me if we call Xesam Convenience Xesam Extended instead :-)

I called it convenience because it's "semantically irrelevant" that is you can 
do everything with Xesam core, and xesam convenience is nothing more than  
novice-understandable mapping of Xesam core.

> VCard compatibility:
> > It is not feasible to implement the full vcard functionality. The
> > following
> > simplifications are made:
> > * name is a single field
>
> I guess we can split the name up in subfields (forname, surname,
> middlename) in Xesam Extended or something..?

Maybe Xesam Mappings  is the right place for this. We capture the most 
important features(name in this case) and full vCard is implemented in Xesam 
Mappings.

> * postal addresses are single fields
>
>
> This makes sense. It would take a lot of fields to model a
> nationality-neutral postal address scheme.
>
> * some obscure features are dropped like modem phone number for the sake of
>
> > simplicity
>
> Good

> New design limitations:
> > 1) Source and Content hierarchies are kept separate, that is no Class can
> > inherit source and content at once
> > 2) Each file is assigned at max one content and one source.
>
> Good - as we all agreed on :-)
>
> ------------
>
> > Issues:
> >
> > Maybe we need a better name for MailboxItem and ArchiveItem?
>
> I think we should scrap the Item part of those words. This cat name is not
> describing what the object *is* but what the object comes from. With the
> Item postfixes it sounds like the object with Source=ArchiveItem comes from
> an item withing an archive (fx a jpg in a pdf in a zip).

We need to agree on a consistent Source naming.
Source-Source Item examples:
Filesystem	-File
Archive		-ArchiveItem
Email		-Attachment

It seems resonable to adopt either:
* this is contained in a [Filesystem,Archive,Email]
* this is a [file, archiveitem, attachment]

But not the both at the same time.

> Still not decided on how to PIM stuff.
>
>
> Could we rename Todo to Task instead then? Sounds less nerdy :-)

This is the first time in my life someone calls Todo nerdy. Seen it used 
everywhere. Have nothing against Task though.

> Fields on the Task cat could be Summary, Priority, DueDate. Stuff like a
> Summary and Description can be derived from fields in the Content cat.

There's a good reference: iCalendar. Have to strip many fields to make it 
usable though.

The problem with these PIM things is like this: We have 6 fields and 5 PIM 
classes. Each ones uses 5 fields out of 6, and each one uses a different set.

> Need to revamp media ontology.
>
> > Can we count on backends being able to figure out list lengths? i.e. if
> > we have Software.depends relation, do we need Software.dependCount? I
> > think no.
> > Either we have a *count property for things we don't describe, or we have
> > a
> > list of things and no *count property.
>
> Hmmm... This is a tricky case. The query language cannot handle this atm. -
> Ie searching for all SourceCode items with more than 10 depencies fx -
> unless the number of deps is explicitely stored in a field.

Still potentially every list field asks for an item count companion.

> Should we elaborate comment stats for SourceCode along the way of text
> stats
>
> > or commentCharacterCount is sufficient?
>
> It is sufficient for the only use case I can come up with. Finding
> under-documented stuff. I think we should keep it at this.

You should consider that Xesam or rather Xesam indexers will also double as a 
meta-data extraction tool possibly via other APIs.

For me comments are useful to find user-documented stuff and evaluate just how 
much documentation there is. commentCharCount seems to be sufficient for 
that.

> Questions:
>
> Afaik PDFs (and other office docs) can be password protected. Perhaps
> isPassWordPretected should be moved to contents?

These are two different things. In case of ArchiveItem password protection is 
external to the file, provided by archiver. In case of documents, the 
password protection is internal.

ATM it seems like a good idea to implement it the same way as with keywords. 
Will think more about it of course.

> Is there any general field that names the origin of a file? Fx a the url of
> a downloaded file?

I see you are trying to integrate here one of FDO recommendations. Seems like 
a good idea especially if others use this extended attribute as well.

> You've added MediaList and AudioList. Would it not make sense to have a
> generic List object? Fx a series of images or documents might form a
> slideshow. Perhaps a better metaphor would be Collection. Fx. most IDEs has
> a project-metaphor where a bunch of files is a part of a project. With the
> collection metaphor we could model this.
>
> Now I'm at it - why not a Project cat? It could be a subcat of my suggested
> Collection cat. Projects have names, versions, etc... I have several
> programs that install project files.

Basically Content already implements collection and container functionality. 
So the only thing we need to do is to add a tree of collections since 
different collections imply different content types of things they link to.
Can't think of any specific properties for most collections. Project can be 
quite different though.

I added these sample collections to have people scream "so little! so limited! 
we need, no we demand more!" and actually provide a useful list :)

> Does Audio not have any fields or is it just trimmed down for display
> purposes?

Audio inherits most of Media properties as is, but it's already irrelevant. 
Media is going to be overhauled significantly soon.

-- Evgeny