2007/6/10, Evgeny Egorochkin <<a href="mailto:phreedom.stdin@gmail.com">phreedom.stdin@gmail.com</a>>:<div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> On Sunday 10 June 2007 12:20:48 Mikkel Kamstrup Erlandsen wrote: > 2007/6/9, Evgeny Egorochkin <<a href="mailto:phreedom.stdin@gmail.com">phreedom.stdin@gmail.com</a>>: > > Source attached. Cute picture: > > > > <a href="http://www.freedesktop.org/wiki/PhreedomDraft?action=AttachFile&do=view&t">http://www.freedesktop.org/wiki/PhreedomDraft?action=AttachFile&do=view&t</a> > >arget= viz.png > > Great work. This is really starting to look like something. > > -------------- > > > Design decisions proposed: > > > > Split ontology into Xesam Core, Xesam Convenience and Xesam Mappings > > > > Xesam Core expresses the full semantics of the ontology i.e. it is > > self-sufficient and describes all the useful information we plan > > indexing. > > > > Xesam Convenience contains semantically irrelevant fields which are > > subchildren of Xesam Core fields and provide nothing except more > > human-friendly names and descriptions. > > > > Xesam Mappings provides mapping for external standards like EXIF and > > vCard. > > For each such standard a base set of fields and categories capturing the > > most > > relevant features of the standard is provided in Xesam Core. > > The full standard or a more complete implementation is provided via Xesam > > Mappings. > > The reason: excessive complexity or multiple irrelevant features of the > > standard. > > > > Xesam Core is the primary goal for now. The rest will follow as the need > > arises/time allows > > Ok, I think this (onto split proposal) is a good idea to avoid *trying* to > create the all-encompassing onto in the first take. > > My only gripe is that I don't like the word "Convenience", how about > "Extended" instead? > > So +1 from me if we call Xesam Convenience Xesam Extended instead :-) I called it convenience because it's "semantically irrelevant" that is you can do everything with Xesam core, and xesam convenience is nothing more than novice-understandable mapping of Xesam core.</blockquote><div> Ok, let's just call it "convenience" for now. The exact wording is not central at this point. Is there anything from your current draft that will be punted to convenience or mappings? </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> > VCard compatibility: > > It is not feasible to implement the full vcard functionality. The > > following > > simplifications are made: > > * name is a single field > > I guess we can split the name up in subfields (forname, surname, > middlename) in Xesam Extended or something..? Maybe Xesam Mappings  is the right place for this. We capture the most important features(name in this case) and full vCard is implemented in Xesam Mappings. > * postal addresses are single fields > > > This makes sense. It would take a lot of fields to model a > nationality-neutral postal address scheme. > > * some obscure features are dropped like modem phone number for the sake of > > > simplicity > > Good > New design limitations: > > 1) Source and Content hierarchies are kept separate, that is no Class can > > inherit source and content at once > > 2) Each file is assigned at max one content and one source. > > Good - as we all agreed on :-) > > ------------ > > > Issues: > > > > Maybe we need a better name for MailboxItem and ArchiveItem? > > I think we should scrap the Item part of those words. This cat name is not > describing what the object *is* but what the object comes from. With the > Item postfixes it sounds like the object with Source=ArchiveItem comes from > an item withing an archive (fx a jpg in a pdf in a zip). We need to agree on a consistent Source naming. Source-Source Item examples: Filesystem      -File Archive         -ArchiveItem Email           -Attachment It seems resonable to adopt either: * this is contained in a [Filesystem,Archive,Email] * this is a [file, archiveitem, attachment] But not the both at the same time.</blockquote><div> Right. This is tricky. I  really think the "this comes from"-metaphor is the closes to the intention. The "this is a"-metaphor is already what categories imply. Because of this I also think that Mailbox is a better  source name than Email. The Attachment is more subtle because in some way it does make sense to say that "holiday1.jpg comes from an attachment", I can easily imagine several arguments against this metaphor but it is really not a clear cut case. </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Still not decided on how to PIM stuff. > > > Could we rename Todo to Task instead then? Sounds less nerdy :-) This is the first time in my life someone calls Todo nerdy. </blockquote><div> About time someone broke it to you then :-) "Todo" *is* a geek term - atleast my wife never used it before she met me :-) </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> > Fields on the Task cat could be Summary, Priority, DueDate. Stuff like a > Summary and Description can be derived from fields in the Content cat. There's a good reference: iCalendar. Have to strip many fields to make it usable though. The problem with these PIM things is like this: We have 6 fields and 5 PIM classes. Each ones uses 5 fields out of 6, and each one uses a different set.</blockquote><div> Eeek. Good that I'm not the ontology maintainer ;-P </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> > Need to revamp media ontology. > > > Can we count on backends being able to figure out list lengths? i.e. if > > we have Software.depends relation, do we need Software.dependCount? I > > think no. > > Either we have a *count property for things we don't describe, or we have > > a > > list of things and no *count property. > > Hmmm... This is a tricky case. The query language cannot handle this atm. - > Ie searching for all SourceCode items with more than 10 depencies fx - > unless the number of deps is explicitely stored in a field. Still potentially every list field asks for an item count companion. </blockquote><div> We will need some feedback from the various projects on this. I'm not sure it is even possible to query the length of list if you intend to keep a decent performance. But that is probably very implementation specific. </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Should we elaborate comment stats for SourceCode along the way of text > stats > > > or commentCharacterCount is sufficient? > > It is sufficient for the only use case I can come up with. Finding > under-documented stuff. I think we should keep it at this. You should consider that Xesam or rather Xesam indexers will also double as a meta-data extraction tool possibly via other APIs. For me comments are useful to find user-documented stuff and evaluate just how much documentation there is. commentCharCount seems to be sufficient for that.</blockquote><div> Ok, let's keep it at that for now then. </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> > Questions: > > Afaik PDFs (and other office docs) can be password protected. Perhaps > isPassWordPretected should be moved to contents? These are two different things. In case of ArchiveItem password protection is external to the file, provided by archiver. In case of documents, the password protection is internal. ATM it seems like a good idea to implement it the same way as with keywords. Will think more about it of course. </blockquote><div> I'm affarid I can't see the probelm here. There might be different implementations behind the different password protection mechanisms, but all that we are interested in is whether or not the file is protected. </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Is there any general field that names the origin of a file? Fx a the url of > a downloaded file? I see you are trying to integrate here one of FDO recommendations. Seems like a good idea especially if others use this extended attribute as well. > You've added MediaList and AudioList. Would it not make sense to have a > generic List object? Fx a series of images or documents might form a > slideshow. Perhaps a better metaphor would be Collection. Fx. most IDEs has > a project-metaphor where a bunch of files is a part of a project. With the > collection metaphor we could model this. > > Now I'm at it - why not a Project cat? It could be a subcat of my suggested > Collection cat. Projects have names, versions, etc... I have several > programs that install project files. Basically Content already implements collection and container functionality.</blockquote><div> You mean via the content.contains, links, depends fields? It might still be useful with some cats for this though - as I assume Content will be an abstract cat. </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">So the only thing we need to do is to add a tree of collections since different collections imply different content types of things they link to. Can't think of any specific properties for most collections. Project can be quite different though. I added these sample collections to have people scream "so little! so limited! we need, no we demand more!" and actually provide a useful list :) </blockquote><div> Ok, I really don't think we shoud forget about a Project category though. </div> Cheers, Mikkel </div>