recent-file-spec: possible design flow ?
bastian at kde.org
Mon Jul 14 16:33:57 EEST 2003
-----BEGIN PGP SIGNED MESSAGE-----
On Monday 14 July 2003 14:51, Oliver Braun wrote:
> Hi *,
> we - the SUN team working on OpenOffice.org - noticed a possible design
> flow in the recent-file-spec (or at least in the Gnome 2.2
> implementation of it) when looking at the Ximian patches for
> it seems that Gnome 2.2 converts the full local file path to utf-8
> before encoding the result as file url. It uses the text encoding
> matching the current locale as "from" encoding. This is not reversable
> if the path contains bytes that are not valid characters in this
> encoding (multi encoding paths) !
> The result will be that the application launched by the panel will not
> be able to open such a file when chosen by the user from the "Open
> Recent" menu. Unfortunatly we made the same mistake in OpenOffice.org
> 1.x :(. The only way to handle multi encoding paths correctly seems to
> be to encode the byte sequence as returned by the file system layer.
> The recent file spec says <QUOTE> All text in the file should be stored
> in the UTF-8 encoding.</QUOTE>, which IMHO can easily (mis- ?)
> understood as "convert file names to utf-8".
> How does KDE expect file urls to be encoded ?
I'm not aware of any recent-file-spec or KDE implementing it, but in general
KDE converts filenames from locale-encoding to 16-bit unicode which is used
internally, and typically stored on disk as utf-8.
URL's are handled slightly different, when storing filenames as URL's, they
are re-encoded using the locale-encoding and then the non-ascii part is
%-encoded. That results in a URL that consists of ASCII-chars only and the
octets of the URL match 1:1 with the octets of the original filename
(assuming decoding/encoding with the locale-encoding is reversable)
We did identify a problem when using utf-8 as locale-encoding. If a filename
is not a valid utf-8 sequence then decoding/encoding such filename will
change it. We intent to fix that by recording the invalid-utf8 sequence in
the 16-bit unicode string so that we can still convert it back to the
original sequence when converting back to "utf-8" (It will not be valid
Are you aware of other encodings than utf-8 where this might be a problem?
bastian at kde.org -=|[ SuSE, The Linux Desktop Experts ]|=- bastian at suse.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
-----END PGP SIGNATURE-----
More information about the xdg