[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
bugzilla-daemon at bugs.documentfoundation.org
bugzilla-daemon at bugs.documentfoundation.org
Wed Jun 19 10:14:44 UTC 2019
https://bugs.documentfoundation.org/show_bug.cgi?id=125995
--- Comment #2 from Stephan Bergmann <sbergman at redhat.com> ---
(In reply to Stephan Bergmann from comment #1)
> Arguably, according to my above explanation, the gen file picker shows the
> right file name here. With LANG=C, osl_getThreadTextEconding() effectively
> is RTL_TEXTENCODING_ISO_8859_1 (though technically it is
> RTL_TEXTENCODING_ASCII_US), so you get "ÅÄka.jpg".
(Above and below, Bugzilla apparently dropped the C1 control characters \U+0082
and \U+0085 from "ÅÄka.jpg", where they should appear after "Å" and after "Ä",
respectively.)
> The kde5 and gtk3 file pickers presumably use external library code that
> doesn't follow LO's convention of interpreting pathnames' byte sequences
> according to the system locale, but instead always interpret them as UTF-8.
> That would explain why the kde5 file picker dialog shows the file's name as
> "łąka.png" instead of "ÅÄka.jpg". But once the kde5 file picker has passed
> the <file:///.../%C5%82%C4%85ka.jpg> URL (which is the same URL as the gen
> file picker passes) to LO's internals, LO will again treat that as
> representing a pathname whose bytes are interpreted according to
> osl_getThreadTextEncoding().
Sorry, the above "which is the same URL as the gen file picker passes" is
wrong: With LANG=C, LO interprets that file name as written with the
characters
\U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
\U+0082 <control>
\U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
\U+0085 <control>
\U+006B LATIN SMALL LETTER K
...
and "LO internal file URLs" always have their "payload" encoded as UTF-8 (see
udkapi/com/sun/star/uri/XExternalUriReferenceTranslator.idl), so the LO
internal file URL that the gen file picker returns is
<file:///.../%C3%85%C2%82%C3%84%C2%85ka.png>. (And when LO wants to access the
actual file and converts that URL back to a pathname byte sequence under
LANG=C, it first converts from the URL syntax "%C3%85%C2%82%C3%84%C2%85ka.png"
to an OUString containing
\U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
\U+0082 <control>
\U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
\U+0085 <control>
\U+006B LATIN SMALL LETTER K
...
code units, and then, because of the osl_getThreadTextEncoding() mandated by
LANG=C, to the correct byte sequence "\xC5\x82\xC4\x85ka.png".)
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20190619/5cfb333c/attachment.html>
More information about the Libreoffice-bugs
mailing list