'file' URI scheme
Thiago Macieira
thiago at kde.org
Mon Mar 31 03:41:32 PDT 2008
On Monday 31 March 2008 12:17:01 Alexander Larsson wrote:
> On Mon, 2008-03-31 at 00:58 +0200, Thiago Macieira wrote:
> > [RFC 3987 requires URIs to be in UTF-8, which means that practically the
> > only valid encoding on Linux now is UTF-8.]
>
> I think there is some sort of misunderstanding here. All valid URIs are
> ASCII (non-ASCII needs to be escaped, making it ASCII). ASCII is a
> subset of UTF-8, so all valid URIs are UTF-8.
>
> RFC 3987 isn't about URIs at all, but IRIs, and it does not *require*
> things to be UTF-8. All it does is *allow* UTF-8 to be in an IRI without
> having to escape it. You can still create a valid IRI for a filename
> that has non-utf8 in the pathname, it will just contain hex escapes.
Sorry. You're right. I was thinking of something else and it no longer
applies.
The two consequences are:
- file:///home/thiago/Résumé.pdf is equivalent to
file:///home/thiago/R%c3%a9sum%c3%a9.pdf
- for the same reason, if you have a non-UTF-8 encoding, a file
named "Résumé.pdf" will NOT show as the URLs above. This includes HTML source
files (i.e., <a href="résumé.html"> or <a href="résumé.pdf">
point to the first URL, not the locally-encoded version of it)
When I said "the only valid encoding", I should have said that it's the only
encoding that can be easily, currently supported. To support other encodings,
programs need special code to deal with file:/// URLs and will still create
some issues.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freedesktop.org/archives/xdg/attachments/20080331/c06c0bb4/attachment.pgp
More information about the xdg
mailing list