[Wasabi] Kicking of the Metadata spec - brainstorm

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Tue Feb 20 06:02:11 PST 2007

2007/2/20, Jos van den Oever <jvdoever at gmail.com>:
> 2007/2/19, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> > Let's get the ball rolling on the metadata spec. This first period will
> just
> > be *brainstorming*, so let's try and avoid the nitty gritty details for
> now.
> >
> >  ** What we need:
> >
> >   Fields)  Metadata field names and descriptions for *desktop* objects
> >
> >   Types) A type grouping of metadata fields to be used in user search
> > language. Example types could be "Email", "Image", "Audio", etc.
> >
> >   API) A dbus api to get/set metadata
> >
> >   ?Tag/Emblem) Tagging/Keywords/Emblems
> >
> >  ** Starting points/References:
> >  - Adobe XMP:
> >
> http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf
> >
> >  - Shared Metadata Spec:
> > http://freedesktop.org/wiki/Standards_2fshared_2dfilemetadata_2dspec
> >  - Tracker metadata api:
> >
> http://svn.gnome.org/viewcvs/tracker/trunk/data/tracker-introspect.xml?view=markup
> >
> >  - Spotlight Metadata Spec:
> >
> http://developer.apple.com/documentation/Carbon/Reference/MetadataAttributesRef/Reference/CommonAttrs.html
> >
> >  - Shared Emblem Spec:
> > http://freedesktop.org/wiki/Standards_2fdesktop_2demblem_2dspec
> >  - Others ideas? Nepomuk-specs? Beagle-specs?
> >
> >  ** My thoughts:
> > Regarding Fields): To prevent death-by-1000-page-spec I suggest we keep
> the
> > field names to a core set of commonly used attributes. Ie not like
> Apples
> > spotlight spec (see above) which defines every known property in the
> > universe. When things move on, teams with expert knowledge can refine
> > extensions to this spec. Fx a Wasabi Photography Metadata spec could be
> > hashed out by people in the know (which could just be EXIF, but I'm not
> the
> > photography expert).
> >
> > Regarding Types): There are some suggestions in the top of the Tracker
> api
> > link above. Regarding these I think we should leave the VFS* types out,
> and
> > only use single-word type names (Ie no spaces).
> >
> > On the API): Obviously we getters and setters. They probably need to
> operate
> > on uris. There probably needs to be some search functionality in here
> too
> > since we probably shouldn't assume that the indexer and metadata server
> are
> > the same.
> >
> > Tagging/Emblems: If you ask me they should be "just another type of
> > metadata". When the metadata spec matures a bit we can evaluate if it
> needs
> > it's own api to make things easier (and allow for dedicated tagging
> > services).
> Hi All,
> First I'd like to point to the original mail I sent on this subject.
> It already contained a relatively simple spec framework. That is, not
> attribute names, but a way to define them, type them and check them.
> There was also some code attached to do allow testsets to check the
> correctness of metadata extraction from files. Hence the title of the
> mail: 'mimetype standardization by testsets'. I still stand by this
> idea.

Sorry Jos, how could I miss this out. For reference - here's the original

Here is an idea for a simple proposal.
> - Each metadata type is identified by a URI. E.g.
> http://www.freedesktop.org/metadata/xhtml1/title.
> - For each URI there will be human readable descriptions in every
> language and keywords in every language. I will use the keyword in the
> further description mixed with the URI.

I like this idea as such. I can't readily see how it intermixes with known
widespread standards such as DC though..?

- It also has a simple type: integer, string, float, binary. Or the
> more elaborate list of the tracker spec or the xml schema simple
> types. Personally, I prefer the xml schema spec [1]. We'd need to
> support only a subset.
> - Each type has a maximal cardinality. This means how often a field
> may occur per file/object. For example the metadata 'size' should
> occur only once, but the metadata 'tag' may occur multiple times.
> - Each may have one parent type. Cardinality and type of the parent is
> inherited, but may be restricted. Having multiple parents is a bad
> idea I think.
> - Each type is embedded, not embedded, or unspecified.
> - Each type is derived or not derived. E.g. 'size' is derived, but
> 'title' is not. This means that 'title' is potentially writeable.
> Whether a metadata field is writeable depends on the implementation.
> Using 'embedded' and 'derived' instead of 'writeable' is clearer,
> because 'writeable' depends on a number of factors: can the software
> write the property, is the file writeable, can the database handle
> external metdata.
> Groups are defined separately from the types. They are simple lists of
> metadata type uris. All children of these URI's also fall into this
> group. Groups are also identified by a URI and they have translations
> in different languages for the user interfaces. They may also have a
> short keyword form. The metadata types in a group do not need to have
> the same cardinality or data type.
> What do you think?

This was somewhat close to the things I have been thinking about. I have to
give this a bit more though when I get home...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070220/b3cd4fd8/attachment.htm 

More information about the xdg mailing list