'file' URI scheme
alexl at redhat.com
Mon Mar 31 03:49:38 PDT 2008
On Mon, 2008-03-31 at 11:41 +0100, Thiago Macieira wrote:
> On Monday 31 March 2008 12:17:01 Alexander Larsson wrote:
> > On Mon, 2008-03-31 at 00:58 +0200, Thiago Macieira wrote:
> > > [RFC 3987 requires URIs to be in UTF-8, which means that practically the
> > > only valid encoding on Linux now is UTF-8.]
> > I think there is some sort of misunderstanding here. All valid URIs are
> > ASCII (non-ASCII needs to be escaped, making it ASCII). ASCII is a
> > subset of UTF-8, so all valid URIs are UTF-8.
> > RFC 3987 isn't about URIs at all, but IRIs, and it does not *require*
> > things to be UTF-8. All it does is *allow* UTF-8 to be in an IRI without
> > having to escape it. You can still create a valid IRI for a filename
> > that has non-utf8 in the pathname, it will just contain hex escapes.
> Sorry. You're right. I was thinking of something else and it no longer
> The two consequences are:
> - file:///home/thiago/Résumé.pdf is equivalent to
> - for the same reason, if you have a non-UTF-8 encoding, a file
> named "Résumé.pdf" will NOT show as the URLs above. This includes HTML source
> files (i.e., <a href="résumé.html"> or <a href="résumé.pdf">
> point to the first URL, not the locally-encoded version of it)
When you say URL you really mean IRI (i.e. the ones that allow utf8).
You really should only use actual URIs/URLs internally and convert
to/from IRIs only in the last step when displaying the thing to a user
(as apps may not e.g. handle getting IRIs as arguments).
However, what you say is generally true. URIs/IRIs are far nicer in the
user interface when the backend uses UTF8 for filenames.
One way we in Gnome try to handle this is to avoid showing URIs to the
user as much as possible. For instance the nautilus pathbar can show the
right names in your above example even when the URI can't.
More information about the xdg