mime-type/application mapping spec, take #2

Dave Cridland [Home] dave at cridland.net
Tue Jul 1 20:08:04 EEST 2003

On Tue, 2003-07-01 at 13:34, David Faure wrote:
> Whichever the arguments are to call it text/perl instead of application/perl,
> please understand that we're not the ones who decide what the standard
> names are. So if the IANA decides that it shall be application/perl, and if
> we settled on the idea of basing heuristics for default viewers on text/*,
> then we'll have no way to fix the problem. That's why I call this unflexible.
> Names are one thing (out of our hands); behaviour is another thing (in our
> hands).

IANA doesn't decide that either, the IETF decides its tree, and vendors
get allocated a tree. You have every right to argue your case in the
IETF for whatever you like.

But I accept the point, although I seem to have more trust in the IETF
than you do. :-)

> > You're after a relatively high degree of complexity here, and I'm not
> > certain that's actually needed - just more thought on choice of media
> > type.
> I suppose you're referring to the inheritance idea? We need it for many
> other reasons: to be able to rename a mimetype (and install an alias
> for the older name), to have specialized folder types (like e.g. an SMB
> host is almost like a directory, but with a special icon), etc. etc.
> There are many cases of mimetype inheritance, this isn't just an idea
> to "add complexity". We've been needing this for years.

Media types do need some form of canonicalization before any application
does anything with them, hence my somewhat pie-in-the-sky suggestion of
a formal registry at XDG for them. But this is a somewhat orthogonal
issue, which just happens to be solvable by inheritance, but can be
solved seperately, and much more efficiently, by a simpler aliasing

As for SMB hosts being "like" a directory, you surely mean "can be
presented to the user like", since the actual access methods used are
wildly different.

As an aside, an SMB service should surely be represented by a URI, an
access to which provides data of some media type. (Perhaps, in this
case, a multipart/mixed containing message/external-body references to
more URIs, if you really want to go mad, but I suspect a
"multipart/x-smb" would do.)

So far, it seems you'd like uniform semantics - effectively, a group of
actions which can be applied to a particular object, be it a URI or some
media typed data. Is this what you mean by inheritance?

> > Of course, it needs to be somehow agreed which non-standard media types
> > we're standardising on, if you see what I mean. 
> This is the topic of another standard discussed on this list...
> http://www.freedesktop.org/standards/shared-mime-info

I don't see where. I see that XDG will provide a basic corpus of media
type information, not that it will provide information on an agreed set
of non-standard media types. If this is the case, a semi-formal registry
does need implementing, and in turn, some effort to avoid collisions
would seem in order.

> > On this front, there's 
> > little stopping anyone from defining a namespace trick to avoid clashing
> > "X-" subtypes - just define "our" subtypes to begin with "X-XDG." for
> > instance, and setup a registry.
> Oh no, no, no. Please no.
> We have enough mimetype renaming already, when a x-foo mimetype gets
> accepted by the IANA (then we need to rename it to remove the x-) - we just
> had to do that with application/ogg... There: another example of strange mimetype
> naming. It's not audio/ogg, it's application/ogg. IIRC because this can be applied
> to more than audio. However currently, all ogg files I know of, are audio...

Then the application takes the type, and, after the canonicalization
process, has a consistent media type to look for in the database.

Yes, this implies that whatever information is used by the
canonicalization process, and the XDG's supplied corpus, is updated
regularly, but that needs to be done anyway, since we cannot predict
when IANA will complete registration.

I'm not clear on where inheritance of anything would fit in here.

In this particular case, incidentally, several things strike me:

1) Vorbis is the sound format of Ogg, Ogg does indeed have the stated
aim of producing a video codec, and thus should be "application", since
"audio" files can't contain anything but "audio".
2) This means that in this case, treating "application/ogg" as if it
were "audio/ogg" could lead to annoying side-effects later.
3) For now, inheritance could help - but inheritance from what, and what
4) In the future, when Ogg files do indeed carry video as well, does
this indicate we should provide for multiple inheritance?
5) Relying on anything beginning "x-" in the IETF world to stay stable
is asking for trouble, sorry.  Hence my suggestion of a registry - at
least we'd have some stability there.

> So if we also have x-xdg-foo, we have another layer in there, with even more
> mimetype renaming - and incompatibility with all existing systems, including the
> current kde/gnome versions, and apache, and... everything else.

But hang on... Given that there is no standard at the moment, isn't this
going to happen anyway to an extent? (Minor point, incidentally, I'd
suggested terminating the prefix with a dot, since that's how the
current prefixing operates within IANA.)

Agreed, Apache may well tell us an object is of a certain media type
which isn't a "formal" XDG type, but equally, the canonicalization
should catch this. By specifying a prefix to the XDG
standard-but-yet-not-standard media types, we can be reasonably certain
that we're getting what we expect.

Indeed, a good canonicalization process should make all this stuff
easier, I would have thought, and a namespace that XDG traditionally
uses for its non-standard-but-standard media types helps identify those
media types which are agreed by this or future standards.

I think I'm beginning to see what you're thinking. Perhaps I'm a bit

To summarize, then, including those things that have been discussed to
death and generally agreed:

A) Semantics

1) Semantic uniformity is good.
 - Being able to treat all collection or container objects alike, as far
as the UI goes, for instance.

2) We should therefore strive to define some uniform semantics,
applicable to some classes of object we may wish to represent.
 - So collections/containers get treated the same way whatever they are,
as do text files, etc.

3) These uniform semantics consist of actions available to the user.
 - collections/containers can be opened, listed, etc. A text file may be
read, edited, etc.

4) For a given object presented to the user, there needs to be a method
of associating one or more uniform semantics.
 - Object might be something with a media type, or might be something
with a URI, or both, etc. I'm assuming we attach semantics to the media
type, or the URI, or both. Probably both.
 - Do we need to have multiple sets of semantics here? Possibly. A
filesystem object has both a URI (file scheme semantics) and a media
type (assuming we represent directories as multipart/x-directory or
something), and would presumably get semantics from both - Filesystem
semantics because it's a file scheme URI (which might include Rename,
Copy, Move), and container semantics because it's a directory.
 - We need rules to decide which action is presented to the user.

5) Uniform semantics themselves may inherit from one or more other
 - If we only allow an object to get one set of semantics. It largely
depends on what people consider easier to implement.

6) Actions within Uniform Semantics will have implementation methods
associated with them, normally executing an application.
 - Which I think we all agree on.
 - Some actions are obviously silly to actually implement as an exec() -
Rename on a file scheme URI, for one. We'll have to identify these, most

7) Actions within Uniform Semantics may have a default implementation
 - So anything fitting the "audio" semantic might have a default "Play"
action of xmms.
 - Which I think we all agree on.

8) Actions within any Uniform Semantic may be redefined by the user.
 - Which I hope we're all happy with.

B) Media types

1) We need some method for canonicalization of existing MIME media
types, such that all XDG conformant environments agree on the same set
of media types, modulo environment specific types.
2) Whatever method we use for canonicalization, it needs to cope with
possible media type name changes, due to IANA registrations.
3) The agreed set of non-IANA media types should be held within some
form of registry.
 - Which we may have, however, I'm not sure from the specification.
4) The agreed set of non-IANA media types should be prefixed to avoid
potential collisions.
 - I still like this idea. :-)
5) Where a specific subtype is not known to the system, the system may
choose a default based on the top level type, if one is defined.

Are we starting to think along closer lines?

> > Things I haven't mentioned:
> >  - Media types have parameters, we do have to worry about these.
> > Example: "text/plain; charset=foo" should, according to the spec, be
> > treated as application/octet-stream. (ie, opaque data).
> Why shouldn't it be treated like text/plain with charset=foo, when possible?
> I'm not following.

Sorry. My fault for scribbling down random thoughts a bit too quick.

I'm assuming there is no charset known to the system called "foo".

A "text/*" type with an unknown charset has to be treated as
"application/octet-stream" by the system. RFC2046, 4.1.4.

> >[...]
> >  - Can we replace existing XML based database with filesystem based one, 
> > containing (links to) .desktop files?
> ?? The current move is rather the other way round, at least for the mimetype spec...

Whoops, again, not writing down my thoughts carefully enough.

I was thinking in terms of Christophe's draft about dispatching based on
media type and/or URI scheme, in which there's an XML database.

I'm not sure whether it's possible to get rid of it in favour of
(possibly faster) filesystem searches, but it might be worth


More information about the xdg mailing list