recent-file-spec: possible design flow ?

Mon Jul 14 16:33:57 EEST 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 14 July 2003 14:51, Oliver Braun wrote:
> Hi *,
>
> we - the SUN team working on OpenOffice.org - noticed a possible design
> flow in the recent-file-spec (or at least in the Gnome 2.2
> implementation of it) when looking at the Ximian patches for
> OpenOffice.org:
>
> it seems that Gnome 2.2 converts the full local file path to utf-8
> before encoding the result as file url. It uses the text encoding
> matching the current locale as "from" encoding. This is not reversable
> if the path contains bytes that are not valid characters in this
> encoding (multi encoding paths) !
>
> The result will be that the application launched by the panel will not
> be able to open such a file when chosen by the user from the "Open
> Recent" menu. Unfortunatly we made the same mistake in OpenOffice.org
> 1.x :(. The only way to handle multi encoding paths correctly seems to
> be to encode the byte sequence as returned by the file system layer.
>
> The recent file spec says <QUOTE> All text in the file should be stored
> in the UTF-8 encoding.</QUOTE>, which IMHO can easily (mis- ?)
> understood as "convert file names to utf-8".
>
> How does KDE expect file urls to be encoded ?

I'm not aware of any recent-file-spec or KDE implementing it, but in general 
KDE converts filenames from locale-encoding to 16-bit unicode which is used 
internally, and typically stored on disk as utf-8.

URL's are handled slightly different, when storing filenames as URL's, they 
are re-encoded using the locale-encoding and then the non-ascii part is 
%-encoded. That results in a URL that consists of ASCII-chars only and the 
octets of the URL match 1:1 with the octets of the original filename 
(assuming decoding/encoding with the locale-encoding is reversable)

We did identify a problem when using utf-8 as locale-encoding. If a filename 
is not a valid utf-8 sequence then decoding/encoding such filename will 
change it. We intent to fix that by recording the invalid-utf8 sequence in 
the 16-bit unicode string so that we can still convert it back to the 
original sequence when converting back to "utf-8" (It will not be valid 
utf-8)

Are you aware of other encodings than utf-8 where this might be a problem?

Cheers,
Waldo
- -- 
bastian at kde.org -=|[ SuSE, The Linux Desktop Experts ]|=- bastian at suse.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE/ErFFN4pvrENfboIRAupGAJkBGFpB16LhSxaQSA74TZXia/C6WwCbBA31
OWgDG7K3nYnh88dYKm2LD1M=
=C1Su
-----END PGP SIGNATURE-----