proposal for file: uri standard

Alexander Larsson alexl at redhat.com
Mon Mar 29 14:03:52 EEST 2004


Here is some stuff I wrote down in order to standardized the use of
file: uris as used on the desktop. Most of it actually follows from the
various standards involved, so it should not be contentious, but its
good to have it written down plainly.

Opinions?

Standard for exchanging file: URIs
==================================

Rationale
---------

The use of URIs in the desktop is pervasive these days. All the major
desktops have file handling system that use URIs instead of pathnames
to be able to specify files not accessible in the normal UNIX file
system. 

The URIs used in these systems are mostly based on the RFCs specifying
the core URI mechanism and its various protocol versions. However
there are sometimes extensions for new protocols that aren't
standardized yet, and sometimes the standards aren't clear on some
details. 

Passing of URIs between applications happens in various ways such as
drag and drop, cut and paste and command line arguments. In order to be
interoperable there needs to be some standardization of such URIs. Its
the hope of many that eventually we'd have a common standard and
perhaps even a common implementation. However, at the very least, we
need a strict definition of how to specify URIs for absolute local
filenames when exchanging them between applications. This document
gives such a specification.

URI standards
-------------

The specification for file: URIs, RFC2396[1] and RFC1738[2] says that
file URIs are of the form:
   file://<hostname>/<path>

Where the hostname and path parts can contain a limited subset of
ASCII characters, representing their ASCII values, and any other bytes
escaped by using a % followed by a two digit hex value. As a special
case the hostname part can be "localhost" or empty meaning the machine
the URI is being interpreted on.  

Given a URI like this we can unescape it into a hostname, and a string
of octets (of undefined encoding), which we wish to map to a UNIX
filename. 

UNIX filenames
--------------

An absolute filename in UNIX is a string containing filenames
separated by and starting with a '/'. The filenames can contain any
byte values except 0 and '/'.

There is no specified encoding for filenames, and although we hope
that eventually all filenames will be encoded in UTF8 we can't rely on
this, because then we would be unable to e.g. rename a file with a
misencoded filename.

file: URIs on UNIX
------------------

Since each desktop has to have a way to generate displayable versions
of filenames (this generally means somehow generating Unicode for it)
we can rely on support for that in the platform. The internal form of
the file reference (the URI) must always be convertible to the
original UNIX byte-string so that we can operate of the file, so the
display form of the filename should be generated at the last moment
when displaying only.

This gives us the following definition for file: URI that are to be
exchanged with other apps:

File URIs are of the form "file://<hostname>/<path>", where hostname
can be empty, with all non-allowed bytes escaped, containing no
escaped '/' or zero bytes. The unescaped byte string is not supposed to
be interpreted in any way, and is not in a specified encoding. It
corresponds exactly to the filename as used in UNIX system calls. If
you need to display the unescaped filename, that should be handled the
same way you display normal filenames. 

Backwards compatibility:
------------------------

Some current apps generate URIs of the form "file:/<path>". These
are not correct according to RFC1738, so they should not be
generated. However, its recommended that code correctly handle such
URIs as input.

Some current apps generate file URIs by converting the filename from
whatever locale the application runs in to UTF8. This behavior means
that URIs are can't be converted to filenames without knowing the
locale of the application that produced them, and that not all valid
filenames can be converted to URIs. Such behavior is not allowed,
and should be changed.
		      
[1] http://www.ietf.org/rfc/rfc2396.txt
[2] http://www.ietf.org/rfc/rfc1738.txt

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
                   alexl at redhat.com    alla at lysator.liu.se 
He's an old-fashioned sweet-toothed cyborg who dotes on his loving old ma. 
She's an orphaned communist nun from the wrong side of the tracks. They fight 
crime! 





More information about the xdg mailing list