File URI Specification ?

Alexander Larsson alexl at redhat.com
Fri Jun 9 17:34:35 EEST 2006


On Thu, 2006-06-08 at 23:40 +0200, Jaap Karssenberg wrote:
> Hi all,
> 
> Can someone point me to a copy of the file uri specification ? The wiki 
> page http://www.freedesktop.org/wiki/Standards_2ffile_2duri_2dspec is 
> broken.
> 
> I'm struggling with the various version of the file:// uri I get with 
> drag-n-drop from various applications on various platforms :( so I would 
> like to know what the Good Way to handle these is.

I'm not sure what happened to the copy on the freedesktop site. Attached
is a version i had in my homedir (i wrote the thing). Hopefully its the
latest version. It might be possible to do some archeological digging
and find the old one on the site too.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
                   alexl at redhat.com    alla at lysator.liu.se 
He's a leather-clad bohemian boxer with no name. She's a man-hating tomboy 
widow who don't take no shit from nobody. They fight crime! 
-------------- next part --------------
Standard for exchanging file: URIs
==================================

Rationale
---------

The use of URIs in the desktop is pervasive these days. All the major
desktops have file handling system that use URIs instead of pathnames
to be able to specify files not accessible in the normal UNIX file
system. 

The URIs used in these systems are mostly based on the RFCs specifying
the core URI mechanism and its various protocol versions. However
there are sometimes extensions for new protocols that aren't
standardized yet, and sometimes the standards aren't clear on some
details. 

Passing of URIs between applications happens in various ways such as
drag and drop, cut and paste and command line arguments. In order to be
interoperable there needs to be some standardization of such URIs. Its
the hope of many that eventually we'd have a common standard and
perhaps even a common implementation. However, at the very least, we
need a strict definition of how to specify URIs for absolute local
filenames when exchanging them between applications. This document
gives such a specification.

URI standards
-------------

The specification for file: URIs, RFC2396[1] and RFC1738[2] says that
file URIs are of the form:
   file://<hostname>/<path>

Where the hostname and path parts can contain a limited subset of
ASCII characters, representing their ASCII values, and any other bytes
escaped by using a % followed by a two digit hex value. As a special
case the hostname part can be "localhost" or empty meaning the machine
the URI is being interpreted on.  

Given a URI like this we can unescape it into a hostname, and a string
of octets (of undefined encoding), which maps 1:1 to a UNIX filename.

UNIX filenames
--------------

An absolute filename in UNIX is a string containing filenames
separated by and starting with a '/'. The filenames can contain any
byte values except 0 and '/'.

There is no specified encoding for filenames, and although we hope
that eventually all filenames will be encoded in UTF8 we can't rely on
this, because then we would be unable to e.g. rename a file with a
misencoded filename.

file: URIs on UNIX
------------------

Since each desktop has to have a way to generate displayable versions
of filenames (this generally means somehow generating Unicode for it)
we can rely on support for that in the platform. The internal form of
the file reference (the URI) must always be convertible to the
original UNIX byte-string so that we can operate of the file, so the
display form of the filename should be generated at the last moment
when displaying only.

This gives us the following definition for file: URI that are to be
exchanged with other apps:

File URIs are of the form "file://<hostname>/<path>", where hostname
can be empty, with all non-allowed bytes escaped, containing no
escaped '/' or zero bytes. The unescaped byte string is not supposed to
be interpreted in any way, and is not in a specified encoding. It
corresponds exactly to the filename as used in UNIX system calls. If
you need to display the unescaped filename, that should be handled the
same way you display normal filenames. 

Hostnames
---------

When generating a file: uri the hostname part, if nonempty, should be
whatever is returned from gethostname(). This means that the name is
canonical for all users on the same machine, so that you can easily
see if the referenced file is on the current machine. Note that
"localhost" or an empty hostname needs to be handled specially, always
meaning the host the uri is being interpreted on.

Backwards compatibility:
------------------------

Some current apps generate URIs of the form "file:/<path>". These
are not correct according to RFC1738, so they should not be
generated. However for backwards compatibility, it is recommended that
such URIs are interpreted as file URIs with an empty hostname.

Some current apps generate file URIs by converting the filename from
whatever locale the application runs in to UTF8. This behavior means
that URIs can't be converted to filenames without knowing the
locale of the application that produced them, and that not all valid
filenames can be converted to URIs. Such behavior is not allowed,
and should be changed.
		      
[1] http://www.ietf.org/rfc/rfc2396.txt
[2] http://www.ietf.org/rfc/rfc1738.txt


More information about the xdg mailing list