Trash Can Question

Andrea Francia andrea at andreafrancia.it
Fri Aug 21 12:23:24 PDT 2009


2009/8/21 Alexander Larsson <alexl at redhat.com>

> On Thu, 2009-08-06 at 00:05 +0200, Andrea Francia wrote:
> > 2009/8/5 David Faure <faure at kde.org>
> >         In practice I would recommend using utf8 everywhere and
> >         getting rid
> >         of the whole "filesystem encoding" mess in the first place.
> >
> >
> > Who is interested to work a new draft (a draft) of the spec which
> > solves this and the other problems emerged?
>
> This is not a "problem" that should be "solved". It was very
> delibirately added to the spec in order to allow all files to be
> trashed. How would you trash a file named some non-utf8 string if only
> utf8 is allowed in the format?
>
> Filenames on linux are zero terminated arrays of bytes. If you treat it
> like anything else you will just fail in some corner cases.


For me filenames are a list of unicode characters. The way those filenames
are represented using array of bytes is a different issue.
As far I know the filesystem is possible to create filename with the zero
character '\0' or the newline ('\n') in it.

Of course, we should all move towards all filenames being in UTF8, avoid
> creating non-UTF8 filenames, etc.


This sound strange to me, UTF-8 is about encoding not about character set.
May be there is a little misunderstanding about utf8, unicode and encoding
system.

It seems to me that you are using the term utf-8 as character set.

I see two different aspects:
 1) which character set the trash system should be able to handle?
 2) how the trash system handle it?

I think that the trash system should be able to manage filenames and path
expressed in unicode.
One way to encode unicode characters is UTF-8, but there also UTF-16, and
others.

I don't see any problem with filesystem whose filenames aren't encoded in
non-utf8.
All the pre-unicode character set are part of unicode and all character of
unicode can be represented in utf8.


> That is a different issue, and should
> not make us limit our specifications to only work on a subset of the
> valid filenames.


That's true but currently I see the following problems:
 - the subset of valid filenames doesn't contains filenames with '\n' or
'\0' in it
 - isn't clear (probably only be for me) which encoding should be used for
reading .trashinfo files.
 - the uses of character set like latin1 for encoding .trashinfo files
contents could lead to a loss of information


-- 
Andrea Francia
http://andreafrancia.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20090821/c710d054/attachment.html 


More information about the xdg mailing list