<br><br><div class="gmail_quote">2009/8/21 Alexander Larsson <span dir="ltr"><<a href="mailto:alexl@redhat.com">alexl@redhat.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> <div class="im">On Thu, 2009-08-06 at 00:05 +0200, Andrea Francia wrote:<br> > 2009/8/5 David Faure <<a href="mailto:faure@kde.org">faure@kde.org</a>><br> > In practice I would recommend using utf8 everywhere and<br> > getting rid<br> > of the whole "filesystem encoding" mess in the first place.<br> ><br> ><br> > Who is interested to work a new draft (a draft) of the spec which<br> > solves this and the other problems emerged?<br> <br> </div>This is not a "problem" that should be "solved". It was very<br> delibirately added to the spec in order to allow all files to be<br> trashed. How would you trash a file named some non-utf8 string if only<br> utf8 is allowed in the format?<br> <br> Filenames on linux are zero terminated arrays of bytes. If you treat it<br> like anything else you will just fail in some corner cases.</blockquote><div><br></div><div>For me filenames are a list of unicode characters. The way those filenames are represented using array of bytes is a different issue.</div> <div>As far I know the filesystem is possible to create filename with the zero character '\0' or the newline ('\n') in it. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> Of course, we should all move towards all filenames being in UTF8, avoid<br> creating non-UTF8 filenames, etc. </blockquote><div><br></div><div>This sound strange to me, UTF-8 is about encoding not about character set.</div><div><div>May be there is a little misunderstanding about utf8, unicode and encoding system.</div> <div><br></div><div>It seems to me that you are using the term utf-8 as character set.</div></div><div><br></div><div>I see two different aspects:</div><div> 1) which character set the trash system should be able to handle?</div> <div> 2) how the trash system handle it?</div><div><br></div><div>I think that the trash system should be able to manage filenames and path expressed in unicode.</div><div>One way to encode unicode characters is UTF-8, but there also UTF-16, and others.</div> <div><br></div><div>I don't see any problem with filesystem whose filenames aren't encoded in non-utf8. </div><div>All the pre-unicode character set are part of unicode and all character of unicode can be represented in utf8.</div> <div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">That is a different issue, and should<br> not make us limit our specifications to only work on a subset of the<br> valid filenames.</blockquote><div><br></div><div>That's true but currently I see the following problems:</div><div> - the subset of valid filenames doesn't contains filenames with '\n' or '\0' in it</div> <div> - isn't clear (probably only be for me) which encoding should be used for reading .trashinfo files.</div><div> - the uses of character set like latin1 for encoding .trashinfo files contents could lead to a loss of information</div> <div> </div></div><br>-- <br>Andrea Francia<br><a href="http://andreafrancia.blogspot.com/">http://andreafrancia.blogspot.com/</a><br>