[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Wed Jun 19 10:14:44 UTC 2019


https://bugs.documentfoundation.org/show_bug.cgi?id=125995

--- Comment #2 from Stephan Bergmann <sbergman at redhat.com> ---
(In reply to Stephan Bergmann from comment #1)
> Arguably, according to my above explanation, the gen file picker shows the
> right file name here.  With LANG=C, osl_getThreadTextEconding() effectively
> is RTL_TEXTENCODING_ISO_8859_1 (though technically it is
> RTL_TEXTENCODING_ASCII_US), so you get "ÅÄka.jpg".

(Above and below, Bugzilla apparently dropped the C1 control characters \U+0082
and \U+0085 from "ÅÄka.jpg", where they should appear after "Å" and after "Ä",
respectively.)

> The kde5 and gtk3 file pickers presumably use external library code that
> doesn't follow LO's convention of interpreting pathnames' byte sequences
> according to the system locale, but instead always interpret them as UTF-8. 
> That would explain why the kde5 file picker dialog shows the file's name as
> "łąka.png" instead of "ÅÄka.jpg".  But once the kde5 file picker has passed
> the <file:///.../%C5%82%C4%85ka.jpg> URL (which is the same URL as the gen
> file picker passes) to LO's internals, LO will again treat that as
> representing a pathname whose bytes are interpreted according to
> osl_getThreadTextEncoding().

Sorry, the above "which is the same URL as the gen file picker passes" is
wrong:  With LANG=C, LO interprets that file name as written with the
characters

  \U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
  \U+0082 <control>
  \U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
  \U+0085 <control>
  \U+006B LATIN SMALL LETTER K
  ...

and "LO internal file URLs" always have their "payload" encoded as UTF-8 (see
udkapi/com/sun/star/uri/XExternalUriReferenceTranslator.idl), so the LO
internal file URL that the gen file picker returns is
<file:///.../%C3%85%C2%82%C3%84%C2%85ka.png>.  (And when LO wants to access the
actual file and converts that URL back to a pathname byte sequence under
LANG=C, it first converts from the URL syntax "%C3%85%C2%82%C3%84%C2%85ka.png"
to an OUString containing

  \U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
  \U+0082 <control>
  \U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
  \U+0085 <control>
  \U+006B LATIN SMALL LETTER K
  ...

code units, and then, because of the osl_getThreadTextEncoding() mandated by
LANG=C, to the correct byte sequence "\xC5\x82\xC4\x85ka.png".)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20190619/5cfb333c/attachment.html>


More information about the Libreoffice-bugs mailing list