(new) non-ASCII filenames break unit tests on Linux
Stephan Bergmann
stephan.bergmann at allotropia.de
Sun Dec 3 11:59:56 UTC 2023
On 12/2/23 16:38, Mike Kaganski wrote:
> On 02.12.2023 17:46, Rene Engelhard wrote:
>> In any case this is bad. My filesystem (I think from 2020 or so)
>> apparently shows it (ls -l does) but I wouldn't be sure for other, old
>> ones (like Debians build machines). The locale this fails under
>> definitely is UTF-8 though.
Pre
<https://git.libreoffice.org/core/+/fbf025b4903bfcb93c3d4bbf1ebbf860cf11618d%5E%21>
"Make testHybridPDFFile Windows-only, and filenames in repo ASCII-only",
I can reproduce the failure on Linux when not using an UTF-8 locale but
explicitly specifying an e.g. ASCII locale (and thus an
osl_getThreadTextEncoding value of RTL_TEXTENCODING_ASCII_US) with
`LC_CTYPE=C make -O CppunitTest_filter_textfilterdetect
CPPUNIT_TEST_NAME=testHybridPDFFile::TestBody`.
> But if someone has an idea why LibreOffice fails handling files that
> exist on system, with names representable in system encoding, it would
> be nice.
For better or worse, the payload of LO "internal" file URLs is always
considered to be a UTF-8 encoding of the actual system pathname. It is
*not* a byte-for-byte representation of the bytes that make up the Unix
system pathname.
What thus happens here is that the file UCP's TaskManager::getv ->
osl::DirectoryItem::get -> osl_getDirectoryItem ->
osl::detail::convertUrlToPathname -> getSystemPathFromFileUrl ->
decodeFromUtf8 -> convert -> UnicodeToTextConverter_Impl::convert ->
rtl_convertUnicodeToText tries to translate the Unicode chars of
"hybrid_writer_абв_αβγ.pdf" to osl_getThreadTextEncoding() ==
RTL_TEXTENCODING_ASCII_US, but which doesn't work because ASCII has no
representation of the Cyrillic and Greek letters.
More information about the LibreOffice
mailing list