[Wasabi] Kicking of the Metadata spec - brainstorm

Tue Feb 20 00:13:24 PST 2007

2007/2/19, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> Let's get the ball rolling on the metadata spec. This first period will just
> be *brainstorming*, so let's try and avoid the nitty gritty details for now.
>
>  ** What we need:
>
>   Fields)  Metadata field names and descriptions for *desktop* objects
>
>   Types) A type grouping of metadata fields to be used in user search
> language. Example types could be "Email", "Image", "Audio", etc.
>
>   API) A dbus api to get/set metadata
>
>   ?Tag/Emblem) Tagging/Keywords/Emblems
>
>  ** Starting points/References:
>  - Adobe XMP:
> http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf
>
>  - Shared Metadata Spec:
> http://freedesktop.org/wiki/Standards_2fshared_2dfilemetadata_2dspec
>  - Tracker metadata api:
> http://svn.gnome.org/viewcvs/tracker/trunk/data/tracker-introspect.xml?view=markup
>
>  - Spotlight Metadata Spec:
> http://developer.apple.com/documentation/Carbon/Reference/MetadataAttributesRef/Reference/CommonAttrs.html
>
>  - Shared Emblem Spec:
> http://freedesktop.org/wiki/Standards_2fdesktop_2demblem_2dspec
>  - Others ideas? Nepomuk-specs? Beagle-specs?
>
>  ** My thoughts:
> Regarding Fields): To prevent death-by-1000-page-spec I suggest we keep the
> field names to a core set of commonly used attributes. Ie not like Apples
> spotlight spec (see above) which defines every known property in the
> universe. When things move on, teams with expert knowledge can refine
> extensions to this spec. Fx a Wasabi Photography Metadata spec could be
> hashed out by people in the know (which could just be EXIF, but I'm not the
> photography expert).
>
> Regarding Types): There are some suggestions in the top of the Tracker api
> link above. Regarding these I think we should leave the VFS* types out, and
> only use single-word type names (Ie no spaces).
>
> On the API): Obviously we getters and setters. They probably need to operate
> on uris. There probably needs to be some search functionality in here too
> since we probably shouldn't assume that the indexer and metadata server are
> the same.
>
> Tagging/Emblems: If you ask me they should be "just another type of
> metadata". When the metadata spec matures a bit we can evaluate if it needs
> it's own api to make things easier (and allow for dedicated tagging
> services).

Hi All,

First I'd like to point to the original mail I sent on this subject.
It already contained a relatively simple spec framework. That is, not
attribute names, but a way to define them, type them and check them.
There was also some code attached to do allow testsets to check the
correctness of metadata extraction from files. Hence the title of the
mail: 'mimetype standardization by testsets'. I still stand by this
idea.

Here is an idea for a simple proposal.

- Each metadata type is identified by a URI. E.g.
http://www.freedesktop.org/metadata/xhtml1/title.
- For each URI there will be human readable descriptions in every
language and keywords in every language. I will use the keyword in the
further description mixed with the URI.
- It also has a simple type: integer, string, float, binary. Or the
more elaborate list of the tracker spec or the xml schema simple
types. Personally, I prefer the xml schema spec [1]. We'd need to
support only a subset.
- Each type has a maximal cardinality. This means how often a field
may occur per file/object. For example the metadata 'size' should
occur only once, but the metadata 'tag' may occur multiple times.
- Each may have one parent type. Cardinality and type of the parent is
inherited, but may be restricted. Having multiple parents is a bad
idea I think.
- Each type is embedded, not embedded, or unspecified.
- Each type is derived or not derived. E.g. 'size' is derived, but
'title' is not. This means that 'title' is potentially writeable.

Whether a metadata field is writeable depends on the implementation.
Using 'embedded' and 'derived' instead of 'writeable' is clearer,
because 'writeable' depends on a number of factors: can the software
write the property, is the file writeable, can the database handle
external metdata.

Groups are defined separately from the types. They are simple lists of
metadata type uris. All children of these URI's also fall into this
group. Groups are also identified by a URI and they have translations
in different languages for the user interfaces. They may also have a
short keyword form. The metadata types in a group do not need to have
the same cardinality or data type.

What do you think?

Cheers,
Jos

[1] http://www.w3.org/TR/xmlschema-2/
http://www.w3.org/TR/xmlschema-2/type-hierarchy.gif