[Xesam] Ontology snapshot

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Mon Jun 11 12:36:41 PDT 2007

2007/6/11, Dave Cridland <dave at cridland.net>:
> On Mon Jun 11 12:29:10 2007, Mikkel Kamstrup Erlandsen wrote:
> > - Filesystem : The object data is stored on the fs
> This one I understand. But does it count only if it's a local
> filesystem? Or only one that's mounted?

Well, it might be an idea to have to subsources  of File. LocalFile and
RemoteFile since the RemoteFile can have additional metadata associated with
it. OTOH it might be taking it too far in the first iteration of the
ontology. From the ontologys pow there's no difference between files on
mounted- or unmounted file systems. It could be specced out that the search
api should only return files on mounted filesystems. This might also be too
much detail, and could probably be left unspeced without big disaster (fx
some indexers might want to show stuff from removable media (together with
with volume label for the source)).

> - Archive : The object data is contained in an archive
> > - Mailbox : The object data has been extracted from a mailbox
> A mailbox - the file sort - is an archive of mail messages - really
> there's nothing more to one than that. I'm sure that someone,
> somewhere, has used tar as a mailbox format - with some careful
> fiddling, it's probably quite a good one.

I do not think that Archives and mailboxes should be intermixed. While they
can have similar representations they are conceptually totally different
from a user pow - and that is what we should map to as an end point.

Taking your tar-mailbox example... The indexer would have to have explicit
support for this sort of stuff for it to detect tar-mailboxes not merely as
tar files. Given this it could install custom ontology extensions fx another
source type TarMailboxFile a subtype of *File*.

> - Attachment : The data of this object is stored as an email
> > attachment
> Mind you, an attachment is just a part of a message that the MUA
> decided not to display inline. Functionally, there's no difference
> otherwise. I'm not entirely sure what the distinction is between this
> and mailbox, though.

You want to be able to search stuff your received as an attachment, and not
just all emails. The distinction is made in  your mail viewer (unless it is
*really* old schoold :-D). You could argument that the EmailAttachment
source is a subsource of the Email source (elsehwhere called Mailbox).

> The metaphor is "the content of this object is stored in".
> Immediately stored in?

That was the idea. However Evgenys comments has led me to reconsider. He
might be right that the StoredAs metaphor might be better.

What happens if I send you a file attached to a message in a mbox
> format mailbox I've included in a tar.gz via email, and your search
> finds that - what's the source set to, and do I care?

If you want to send me a file that is embedded in an mbox I take it you
would send me the actual file and not the entire mbox? If we take it that
you did send me this tarball of an mbox an pointed me to an attachment
inside that I think the indexer will have trouble since it wouldn't want to
represent all the mails in the mbox as my emails.

In any case I think the source would be the mbox file. The source of that
would be the tarball and the source of that would be the email attachment. I
don't see this as a problem in the onto. Anyway I'm willing to bet that most
indexers wouldn't support it anyway.

Maybe it's just me, but I think maybe this sort of thing gets left
> for a bit, until you guys have figured out what it is you need, here.
> Getting the basics working in an extensible way is much, much more
> interesting.

Interesting. You are the first to suggest this - if anyone else is of that
opinion please speak up. I personally think we have to do it now and it
really doesn't have to be rocket science. Also it was actually one of the
(few) things we agreed on at some of our IRC meetings - ie "a hierarchical
type system".

In my mind the "problem" is well understood. We need a way to tell what an
object is (the current category terminology) and we need a way to determine
where and from what the object was extracted (source).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070611/229ca5e6/attachment.html 

More information about the xdg mailing list