proposal for file: uri standard
alexl at redhat.com
Mon Mar 29 14:03:52 EEST 2004
Here is some stuff I wrote down in order to standardized the use of
file: uris as used on the desktop. Most of it actually follows from the
various standards involved, so it should not be contentious, but its
good to have it written down plainly.
Standard for exchanging file: URIs
The use of URIs in the desktop is pervasive these days. All the major
desktops have file handling system that use URIs instead of pathnames
to be able to specify files not accessible in the normal UNIX file
The URIs used in these systems are mostly based on the RFCs specifying
the core URI mechanism and its various protocol versions. However
there are sometimes extensions for new protocols that aren't
standardized yet, and sometimes the standards aren't clear on some
Passing of URIs between applications happens in various ways such as
drag and drop, cut and paste and command line arguments. In order to be
interoperable there needs to be some standardization of such URIs. Its
the hope of many that eventually we'd have a common standard and
perhaps even a common implementation. However, at the very least, we
need a strict definition of how to specify URIs for absolute local
filenames when exchanging them between applications. This document
gives such a specification.
The specification for file: URIs, RFC2396 and RFC1738 says that
file URIs are of the form:
Where the hostname and path parts can contain a limited subset of
ASCII characters, representing their ASCII values, and any other bytes
escaped by using a % followed by a two digit hex value. As a special
case the hostname part can be "localhost" or empty meaning the machine
the URI is being interpreted on.
Given a URI like this we can unescape it into a hostname, and a string
of octets (of undefined encoding), which we wish to map to a UNIX
An absolute filename in UNIX is a string containing filenames
separated by and starting with a '/'. The filenames can contain any
byte values except 0 and '/'.
There is no specified encoding for filenames, and although we hope
that eventually all filenames will be encoded in UTF8 we can't rely on
this, because then we would be unable to e.g. rename a file with a
file: URIs on UNIX
Since each desktop has to have a way to generate displayable versions
of filenames (this generally means somehow generating Unicode for it)
we can rely on support for that in the platform. The internal form of
the file reference (the URI) must always be convertible to the
original UNIX byte-string so that we can operate of the file, so the
display form of the filename should be generated at the last moment
when displaying only.
This gives us the following definition for file: URI that are to be
exchanged with other apps:
File URIs are of the form "file://<hostname>/<path>", where hostname
can be empty, with all non-allowed bytes escaped, containing no
escaped '/' or zero bytes. The unescaped byte string is not supposed to
be interpreted in any way, and is not in a specified encoding. It
corresponds exactly to the filename as used in UNIX system calls. If
you need to display the unescaped filename, that should be handled the
same way you display normal filenames.
Some current apps generate URIs of the form "file:/<path>". These
are not correct according to RFC1738, so they should not be
generated. However, its recommended that code correctly handle such
URIs as input.
Some current apps generate file URIs by converting the filename from
whatever locale the application runs in to UTF8. This behavior means
that URIs are can't be converted to filenames without knowing the
locale of the application that produced them, and that not all valid
filenames can be converted to URIs. Such behavior is not allowed,
and should be changed.
Alexander Larsson Red Hat, Inc
alexl at redhat.com alla at lysator.liu.se
He's an old-fashioned sweet-toothed cyborg who dotes on his loving old ma.
She's an orphaned communist nun from the wrong side of the tracks. They fight
More information about the xdg