[Wasabi] Kicking of the Metadata spec - brainstorm

Tue Feb 20 06:10:35 PST 2007

2007/2/20, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> 2007/2/20, Jos van den Oever <jvdoever at gmail.com>:
>
> > 2007/2/19, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> > > Let's get the ball rolling on the metadata spec. This first period will
> just
> > > be *brainstorming*, so let's try and avoid the nitty gritty details for
> now.
> > >
> > >  ** What we need:
> > >
> > >   Fields)  Metadata field names and descriptions for *desktop* objects
> > >
> > >   Types) A type grouping of metadata fields to be used in user search
> > > language. Example types could be "Email", "Image", "Audio", etc.
> > >
> > >   API) A dbus api to get/set metadata
> > >
> > >   ?Tag/Emblem) Tagging/Keywords/Emblems
> > >
> > >  ** Starting points/References:
> > >  - Adobe XMP:
> > >
> http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf
> > >
> > >  - Shared Metadata Spec:
> > >
> http://freedesktop.org/wiki/Standards_2fshared_2dfilemetadata_2dspec
>
> > >  - Tracker metadata api:
> > >
> http://svn.gnome.org/viewcvs/tracker/trunk/data/tracker-introspect.xml?view=markup
>
> > >
> > >  - Spotlight Metadata Spec:
> > >
> http://developer.apple.com/documentation/Carbon/Reference/MetadataAttributesRef/Reference/CommonAttrs.html
>
> > >
> > >  - Shared Emblem Spec:
> > >
> http://freedesktop.org/wiki/Standards_2fdesktop_2demblem_2dspec
> > >  - Others ideas? Nepomuk-specs? Beagle-specs?
> > >
> > >  ** My thoughts:
> > > Regarding Fields): To prevent death-by-1000-page-spec I suggest we keep
> the
> > > field names to a core set of commonly used attributes. Ie not like
> Apples
> > > spotlight spec (see above) which defines every known property in the
> > > universe. When things move on, teams with expert knowledge can refine
> > > extensions to this spec. Fx a Wasabi Photography Metadata spec could be
> > > hashed out by people in the know (which could just be EXIF, but I'm not
> the
> > > photography expert).
> > >
> > > Regarding Types): There are some suggestions in the top of the Tracker
> api
> > > link above. Regarding these I think we should leave the VFS* types out,
> and
> > > only use single-word type names (Ie no spaces).
> > >
> > > On the API): Obviously we getters and setters. They probably need to
> operate
> > > on uris. There probably needs to be some search functionality in here
> too
> > > since we probably shouldn't assume that the indexer and metadata server
> are
> > > the same.
> > >
> > > Tagging/Emblems: If you ask me they should be "just another type of
> > > metadata". When the metadata spec matures a bit we can evaluate if it
> needs
> > > it's own api to make things easier (and allow for dedicated tagging
> > > services).
> >
> > Hi All,
> >
> > First I'd like to point to the original mail I sent on this subject.
> > It already contained a relatively simple spec framework. That is, not
> > attribute names, but a way to define them, type them and check them.
> > There was also some code attached to do allow testsets to check the
> > correctness of metadata extraction from files. Hence the title of the
> > mail: 'mimetype standardization by testsets'. I still stand by this
> > idea.
>
>
> Sorry Jos, how could I miss this out. For reference - here's the original
> thread:
> http://lists.freedesktop.org/archives/xdg/2006-October/008682.html
Ah yes, thanks for adding the link, I forgot it.

>
> > Here is an idea for a simple proposal.
> >
> > - Each metadata type is identified by a URI. E.g.
> >
> http://www.freedesktop.org/metadata/xhtml1/title.
> > - For each URI there will be human readable descriptions in every
> > language and keywords in every language. I will use the keyword in the
> > further description mixed with the URI.
>
>
> I like this idea as such. I can't readily see how it intermixes with known
> widespread standards such as DC though..?

DC also uses URIs to identify metadata types. It does not define much
more than that though. This is too little for our needs. Within RDF
Schema it is also customary to use URIs for type identification.

> > - It also has a simple type: integer, string, float, binary. Or the
> > more elaborate list of the tracker spec or the xml schema simple
> > types. Personally, I prefer the xml schema spec [1]. We'd need to
> > support only a subset.
> > - Each type has a maximal cardinality. This means how often a field
> > may occur per file/object. For example the metadata 'size' should
> > occur only once, but the metadata 'tag' may occur multiple times.
> > - Each may have one parent type. Cardinality and type of the parent is
> > inherited, but may be restricted. Having multiple parents is a bad
> > idea I think.
> > - Each type is embedded, not embedded, or unspecified.
> > - Each type is derived or not derived. E.g. 'size' is derived, but
> > 'title' is not. This means that 'title' is potentially writeable.
> >
> > Whether a metadata field is writeable depends on the implementation.
> > Using 'embedded' and 'derived' instead of 'writeable' is clearer,
> > because 'writeable' depends on a number of factors: can the software
> > write the property, is the file writeable, can the database handle
> > external metdata.
> >
> > Groups are defined separately from the types. They are simple lists of
> > metadata type uris. All children of these URI's also fall into this
> > group. Groups are also identified by a URI and they have translations
> > in different languages for the user interfaces. They may also have a
> > short keyword form. The metadata types in a group do not need to have
> > the same cardinality or data type.
> >
> > What do you think?
>
>
>
> This was somewhat close to the things I have been thinking about. I have to
> give this a bit more though when I get home...
>
> Cheers,
> Mikkel
>