Trash Can Question

PCMan pcman.tw at gmail.com
Fri Aug 21 17:54:59 PDT 2009


On Sat, Aug 22, 2009 at 3:23 AM, Andrea Francia<andrea at andreafrancia.it> wrote:
>
>
> 2009/8/21 Alexander Larsson <alexl at redhat.com>
>>
>> On Thu, 2009-08-06 at 00:05 +0200, Andrea Francia wrote:
>> > 2009/8/5 David Faure <faure at kde.org>
>> >         In practice I would recommend using utf8 everywhere and
>> >         getting rid
>> >         of the whole "filesystem encoding" mess in the first place.
>> >
>> >
>> > Who is interested to work a new draft (a draft) of the spec which
>> > solves this and the other problems emerged?
>>
>> This is not a "problem" that should be "solved". It was very
>> delibirately added to the spec in order to allow all files to be
>> trashed. How would you trash a file named some non-utf8 string if only
>> utf8 is allowed in the format?
>>
>> Filenames on linux are zero terminated arrays of bytes. If you treat it
>> like anything else you will just fail in some corner cases.
>
> For me filenames are a list of unicode characters. The way those filenames
> are represented using array of bytes is a different issue.
> As far I know the filesystem is possible to create filename with the zero
> character '\0' or the newline ('\n') in it.
>>
>> Of course, we should all move towards all filenames being in UTF8, avoid
>> creating non-UTF8 filenames, etc.
This is not a real solution if you're going to support remote filesystems.
On local machine, you can use any filename encoding you want.
The remote servers, however, cannot be totally migrated to UTF-8 sometimes.
So unless your vfs implementation can convert the encodings and only
show UTF-8 to applications, handling non-UTF-8 will always be needed.
> This sound strange to me, UTF-8 is about encoding not about character set.
> May be there is a little misunderstanding about utf8, unicode and encoding
> system.
> It seems to me that you are using the term utf-8 as character set.
> I see two different aspects:
>  1) which character set the trash system should be able to handle?
>  2) how the trash system handle it?
> I think that the trash system should be able to manage filenames and path
> expressed in unicode.
> One way to encode unicode characters is UTF-8, but there also UTF-16, and
> others.
> I don't see any problem with filesystem whose filenames aren't encoded in
> non-utf8.
> All the pre-unicode character set are part of unicode and all character of
> unicode can be represented in utf8.
No, this assumption is incorrect.
Most of the pre-unicode characters are defined in unicode, but some of
them are not.
So if you convert everything to UTF-8, some data might be lost.
>> That is a different issue, and should
>> not make us limit our specifications to only work on a subset of the
>> valid filenames.
> That's true but currently I see the following problems:
>  - the subset of valid filenames doesn't contains filenames with '\n' or
> '\0' in it
If the stored path is URL-encoded, this is not an issue since they
will be escaped.
Besides, even the filesystem support using \0 inside filename, I don't
believe that there is a real-world filemanager able to handle this. In
theory this should be allowed but it's an extremely rare use case
which doesn't exist in real world.
>  - isn't clear (probably only be for me) which encoding should be used for
> reading .trashinfo files.
>  - the uses of character set like latin1 for encoding .trashinfo files
> contents could lead to a loss of information
>
> --
> Andrea Francia
> http://andreafrancia.blogspot.com/
>
> _______________________________________________
> xdg mailing list
> xdg at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/xdg
>
>


More information about the xdg mailing list