[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Wed Jun 19 07:58:46 UTC 2019


https://bugs.documentfoundation.org/show_bug.cgi?id=125995

--- Comment #1 from Stephan Bergmann <sbergman at redhat.com> ---
(In reply to Jan-Marek Glogowski from comment #0)
> Description:
> This is the "extension" of bug 125971.
> 
> Something in the local file URL handling is currently broken when you use
> the C locale, at least on all unix backends. I can't test MacOS and Windows,
> but since I suspect an error in the URL handling with regard to the current
> locale setting, at least MacOS might be affected too. Has Windows some
> equivalent of C locale?

I'm not sure why you qualify this issue with "currently".  The behavior should
be as it is ever since OOo.

A traditional Unix (incl. Linux) file name is just a sequence of bytes, without
a means specifying in what encoding to interpret those bytes.  Ever since OOo
was made Unicode-aware, it wanted to represent pathnames internally as Unicode
(UTF-16) strings (whether or not that was a good decision, but it's
consequences permeate the code base and it would probably be hard to change it
now).  It adopted the convention of translating between a pathname's bytes and
the internal OUString according to the system locale that OOo is run with
(i.e., LANG/LC_ALL; see osl_getThreadTextEncoding).  (That of course means that
there can be problems, e.g. when a pathname consists of a sequence of bytes
that is not valid according to osl_getThreadTextEncoding(), or when some
internal OUString shall be translated to a pathname's sequence of bytes, but
contains Unicode letters that cannot be mapped to osl_getThreadTextEncoding(). 
OOo/LO have always been prone to such problems.  In practice, their impact is
reduced by people using a single, consistent system locale (text encoding) to
name their files and to run LO, and by many people exclusively using UTF-8
locales anyway these days.)

> Steps to Reproduce:
> 1. Have a unicode / UTF8 file system (that's standard I guess)

Traditional Unix (incl. Linux) file systems do not have an encoding, see above.

> 2. Have a file name with non-ASCII characters (łąka.png - 'LC_ALL=C ls -b'
> will show the correct UTF8 encoding \305\202\304\205ka.png)
> 3. Start LO with LANG=C / LC_ALL=C

This is the "user mistake".  To operate well with files whose names are encoded
in UTF-8, LO should be run with a UTF-8 locale.  Otherwise, problems are
expected to occur (see above).

> 4. Open the file
> 5. Export the file
> 
> Actual Results:
> 1. The file picker for "gen" shows the wrong file names. kde5 and gtk3 are
> fine.

Arguably, according to my above explanation, the gen file picker shows the
right file name here.  With LANG=C, osl_getThreadTextEconding() effectively is
RTL_TEXTENCODING_ISO_8859_1 (though technically it is
RTL_TEXTENCODING_ASCII_US), so you get "ÅÄka.jpg".

The kde5 and gtk3 file pickers presumably use external library code that
doesn't follow LO's convention of interpreting pathnames' byte sequences
according to the system locale, but instead always interpret them as UTF-8. 
That would explain why the kde5 file picker dialog shows the file's name as
"łąka.png" instead of "ÅÄka.jpg".  But once the kde5 file picker has passed
the <file:///.../%C5%82%C4%85ka.jpg> URL (which is the same URL as the gen file
picker passes) to LO's internals, LO will again treat that as representing a
pathname whose bytes are interpreted according to osl_getThreadTextEncoding().

> 2. After opening, the window title has the file name with a wrong encoding.

Again, it is the right encoding according to the above.

> 3. The recent file list has the file name with wrong encoding (which
> actually works!)

ditto...

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20190619/155eb3c7/attachment.html>


More information about the Libreoffice-bugs mailing list