[Xesam] Ontology snapshot
Mikkel Kamstrup Erlandsen
mikkel.kamstrup at gmail.com
Sun Jun 10 12:22:39 PDT 2007
2007/6/10, Evgeny Egorochkin <phreedom.stdin at gmail.com>:
> On Sunday 10 June 2007 12:20:48 Mikkel Kamstrup Erlandsen wrote:
> > 2007/6/9, Evgeny Egorochkin <phreedom.stdin at gmail.com>:
> > > Source attached. Cute picture:
> > >
> > >
> > >arget=viz.png
> > Great work. This is really starting to look like something.
> > --------------
> > > Design decisions proposed:
> > >
> > > Split ontology into Xesam Core, Xesam Convenience and Xesam Mappings
> > >
> > > Xesam Core expresses the full semantics of the ontology i.e. it is
> > > self-sufficient and describes all the useful information we plan
> > > indexing.
> > >
> > > Xesam Convenience contains semantically irrelevant fields which are
> > > subchildren of Xesam Core fields and provide nothing except more
> > > human-friendly names and descriptions.
> > >
> > > Xesam Mappings provides mapping for external standards like EXIF and
> > > vCard.
> > > For each such standard a base set of fields and categories capturing
> > > most
> > > relevant features of the standard is provided in Xesam Core.
> > > The full standard or a more complete implementation is provided via
> > > Mappings.
> > > The reason: excessive complexity or multiple irrelevant features of
> > > standard.
> > >
> > > Xesam Core is the primary goal for now. The rest will follow as the
> > > arises/time allows
> > Ok, I think this (onto split proposal) is a good idea to avoid *trying*
> > create the all-encompassing onto in the first take.
> > My only gripe is that I don't like the word "Convenience", how about
> > "Extended" instead?
> > So +1 from me if we call Xesam Convenience Xesam Extended instead :-)
> I called it convenience because it's "semantically irrelevant" that is you
> do everything with Xesam core, and xesam convenience is nothing more than
> novice-understandable mapping of Xesam core.
Ok, let's just call it "convenience" for now. The exact wording is not
central at this point.
Is there anything from your current draft that will be punted to convenience
> VCard compatibility:
> > > It is not feasible to implement the full vcard functionality. The
> > > following
> > > simplifications are made:
> > > * name is a single field
> > I guess we can split the name up in subfields (forname, surname,
> > middlename) in Xesam Extended or something..?
> Maybe Xesam Mappings is the right place for this. We capture the most
> important features(name in this case) and full vCard is implemented in
> > * postal addresses are single fields
> > This makes sense. It would take a lot of fields to model a
> > nationality-neutral postal address scheme.
> > * some obscure features are dropped like modem phone number for the sake
> > > simplicity
> > Good
> > New design limitations:
> > > 1) Source and Content hierarchies are kept separate, that is no Class
> > > inherit source and content at once
> > > 2) Each file is assigned at max one content and one source.
> > Good - as we all agreed on :-)
> > ------------
> > > Issues:
> > >
> > > Maybe we need a better name for MailboxItem and ArchiveItem?
> > I think we should scrap the Item part of those words. This cat name is
> > describing what the object *is* but what the object comes from. With the
> > Item postfixes it sounds like the object with Source=ArchiveItem comes
> > an item withing an archive (fx a jpg in a pdf in a zip).
> We need to agree on a consistent Source naming.
> Source-Source Item examples:
> Filesystem -File
> Archive -ArchiveItem
> Email -Attachment
> It seems resonable to adopt either:
> * this is contained in a [Filesystem,Archive,Email]
> * this is a [file, archiveitem, attachment]
> But not the both at the same time.
Right. This is tricky. I really think the "this comes from"-metaphor is the
closes to the intention. The "this is a"-metaphor is already what categories
Because of this I also think that Mailbox is a better source name than
Email. The Attachment is more subtle because in some way it does make sense
to say that "holiday1.jpg comes from an attachment", I can easily imagine
several arguments against this metaphor but it is really not a clear cut
> Still not decided on how to PIM stuff.
> > Could we rename Todo to Task instead then? Sounds less nerdy :-)
> This is the first time in my life someone calls Todo nerdy.
About time someone broke it to you then :-) "Todo" *is* a geek term -
atleast my wife never used it before she met me :-)
> > Fields on the Task cat could be Summary, Priority, DueDate. Stuff like a
> > Summary and Description can be derived from fields in the Content cat.
> There's a good reference: iCalendar. Have to strip many fields to make it
> usable though.
> The problem with these PIM things is like this: We have 6 fields and 5 PIM
> classes. Each ones uses 5 fields out of 6, and each one uses a different
Eeek. Good that I'm not the ontology maintainer ;-P
> Need to revamp media ontology.
> > > Can we count on backends being able to figure out list lengths? i.e.
> > > we have Software.depends relation, do we need Software.dependCount? I
> > > think no.
> > > Either we have a *count property for things we don't describe, or we
> > > a
> > > list of things and no *count property.
> > Hmmm... This is a tricky case. The query language cannot handle this
> atm. -
> > Ie searching for all SourceCode items with more than 10 depencies fx -
> > unless the number of deps is explicitely stored in a field.
> Still potentially every list field asks for an item count companion.
We will need some feedback from the various projects on this. I'm not sure
it is even possible to query the length of list if you intend to keep a
decent performance. But that is probably very implementation specific.
> Should we elaborate comment stats for SourceCode along the way of text
> > stats
> > > or commentCharacterCount is sufficient?
> > It is sufficient for the only use case I can come up with. Finding
> > under-documented stuff. I think we should keep it at this.
> You should consider that Xesam or rather Xesam indexers will also double
> as a
> meta-data extraction tool possibly via other APIs.
> For me comments are useful to find user-documented stuff and evaluate just
> much documentation there is. commentCharCount seems to be sufficient for
Ok, let's keep it at that for now then.
> > Afaik PDFs (and other office docs) can be password protected. Perhaps
> > isPassWordPretected should be moved to contents?
> These are two different things. In case of ArchiveItem password protection
> external to the file, provided by archiver. In case of documents, the
> password protection is internal.
> ATM it seems like a good idea to implement it the same way as with
> Will think more about it of course.
I'm affarid I can't see the probelm here. There might be different
implementations behind the different password protection mechanisms, but all
that we are interested in is whether or not the file is protected.
> Is there any general field that names the origin of a file? Fx a the url
> > a downloaded file?
> I see you are trying to integrate here one of FDO recommendations. Seems
> a good idea especially if others use this extended attribute as well.
> > You've added MediaList and AudioList. Would it not make sense to have a
> > generic List object? Fx a series of images or documents might form a
> > slideshow. Perhaps a better metaphor would be Collection. Fx. most IDEs
> > a project-metaphor where a bunch of files is a part of a project. With
> > collection metaphor we could model this.
> > Now I'm at it - why not a Project cat? It could be a subcat of my
> > Collection cat. Projects have names, versions, etc... I have several
> > programs that install project files.
> Basically Content already implements collection and container
You mean via the content.contains, links, depends fields? It might still be
useful with some cats for this though - as I assume Content will be an
So the only thing we need to do is to add a tree of collections since
> different collections imply different content types of things they link
> Can't think of any specific properties for most collections. Project can
> quite different though.
> I added these sample collections to have people scream "so little! so
> we need, no we demand more!" and actually provide a useful list :)
Ok, I really don't think we shoud forget about a Project category though.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the xdg