proposal for file: uri standard
Alexander Larsson
alexl at redhat.com
Mon Mar 29 16:01:26 EEST 2004
New proposal with a hostname section and the comments waldo had:
Standard for exchanging file: URIs
==================================
Rationale
---------
The use of URIs in the desktop is pervasive these days. All the major
desktops have file handling system that use URIs instead of pathnames
to be able to specify files not accessible in the normal UNIX file
system.
The URIs used in these systems are mostly based on the RFCs specifying
the core URI mechanism and its various protocol versions. However
there are sometimes extensions for new protocols that aren't
standardized yet, and sometimes the standards aren't clear on some
details.
Passing of URIs between applications happens in various ways such as
drag and drop, cut and paste and command line arguments. In order to be
interoperable there needs to be some standardization of such URIs. Its
the hope of many that eventually we'd have a common standard and
perhaps even a common implementation. However, at the very least, we
need a strict definition of how to specify URIs for absolute local
filenames when exchanging them between applications. This document
gives such a specification.
URI standards
-------------
The specification for file: URIs, RFC2396[1] and RFC1738[2] says that
file URIs are of the form:
file://<hostname>/<path>
Where the hostname and path parts can contain a limited subset of
ASCII characters, representing their ASCII values, and any other bytes
escaped by using a % followed by a two digit hex value. As a special
case the hostname part can be "localhost" or empty meaning the machine
the URI is being interpreted on.
Given a URI like this we can unescape it into a hostname, and a string
of octets (of undefined encoding), which maps 1:1 to a UNIX filename.
UNIX filenames
--------------
An absolute filename in UNIX is a string containing filenames
separated by and starting with a '/'. The filenames can contain any
byte values except 0 and '/'.
There is no specified encoding for filenames, and although we hope
that eventually all filenames will be encoded in UTF8 we can't rely on
this, because then we would be unable to e.g. rename a file with a
misencoded filename.
file: URIs on UNIX
------------------
Since each desktop has to have a way to generate displayable versions
of filenames (this generally means somehow generating Unicode for it)
we can rely on support for that in the platform. The internal form of
the file reference (the URI) must always be convertible to the
original UNIX byte-string so that we can operate of the file, so the
display form of the filename should be generated at the last moment
when displaying only.
This gives us the following definition for file: URI that are to be
exchanged with other apps:
File URIs are of the form "file://<hostname>/<path>", where hostname
can be empty, with all non-allowed bytes escaped, containing no
escaped '/' or zero bytes. The unescaped byte string is not supposed to
be interpreted in any way, and is not in a specified encoding. It
corresponds exactly to the filename as used in UNIX system calls. If
you need to display the unescaped filename, that should be handled the
same way you display normal filenames.
Hostnames
---------
When generating a file: uri the hostname part, if nonempty, should be
whatever is returned from gethostname(). This means that the name is
canonical for all users on the same machine, so that you can easily
see if the referenced file is on the current machine. Note that
"localhost" or an empty hostname needs to be handled specially, always
meaning the host the uri is being interpreted on.
Backwards compatibility:
------------------------
Some current apps generate URIs of the form "file:/<path>". These
are not correct according to RFC1738, so they should not be
generated. However for backwards compatibility, it is recommended that
such URIs are interpreted as file URIs with an empty hostname.
Some current apps generate file URIs by converting the filename from
whatever locale the application runs in to UTF8. This behavior means
that URIs are can't be converted to filenames without knowing the
locale of the application that produced them, and that not all valid
filenames can be converted to URIs. Such behavior is not allowed,
and should be changed.
[1] http://www.ietf.org/rfc/rfc2396.txt
[2] http://www.ietf.org/rfc/rfc1738.txt
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Alexander Larsson Red Hat, Inc
alexl at redhat.com alla at lysator.liu.se
He's a leather-clad guerilla cat burglar on a mission from God. She's a foxy
paranoid soap star trying to make a difference in a man's world. They fight
crime!
More information about the xdg
mailing list