'file' URI scheme

Thiago Macieira thiago at kde.org
Mon Mar 31 03:41:32 PDT 2008


On Monday 31 March 2008 12:17:01 Alexander Larsson wrote:
> On Mon, 2008-03-31 at 00:58 +0200, Thiago Macieira wrote:
> > [RFC 3987 requires URIs to be in UTF-8, which means that practically the
> > only valid encoding on Linux now is UTF-8.]
>
> I think there is some sort of misunderstanding here. All valid URIs are
> ASCII (non-ASCII needs to be escaped, making it ASCII). ASCII is a
> subset of UTF-8, so all valid URIs are UTF-8.
>
> RFC 3987 isn't about URIs at all, but IRIs, and it does not *require*
> things to be UTF-8. All it does is *allow* UTF-8 to be in an IRI without
> having to escape it. You can still create a valid IRI for a filename
> that has non-utf8 in the pathname, it will just contain hex escapes.

Sorry. You're right. I was thinking of something else and it no longer 
applies.

The two consequences are:
 - file:///home/thiago/Résumé.pdf is equivalent to
   file:///home/thiago/R%c3%a9sum%c3%a9.pdf
 - for the same reason, if you have a non-UTF-8 encoding, a file 
named "Résumé.pdf" will NOT show as the URLs above. This includes HTML source 
files (i.e., <a href="résumé.html"> or <a href="r&eacute;sum&eacute;.pdf"> 
point to the first URL, not the locally-encoded version of it)

When I said "the only valid encoding", I should have said that it's the only 
encoding that can be easily, currently supported. To support other encodings, 
programs need special code to deal with file:/// URLs and will still create 
some issues.

-- 
  Thiago Macieira  -  thiago (AT) macieira.info - thiago (AT) kde.org
    PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freedesktop.org/archives/xdg/attachments/20080331/c06c0bb4/attachment.pgp 


More information about the xdg mailing list