(new) non-ASCII filenames break unit tests on Linux

Rene Engelhard rene at debian.org
Sun Dec 3 13:13:08 UTC 2023


Hi,

Am 03.12.23 um 12:59 schrieb Stephan Bergmann:
> On 12/2/23 16:38, Mike Kaganski wrote:
>> On 02.12.2023 17:46, Rene Engelhard wrote:
>>> In any case this is bad. My filesystem (I think from 2020 or so) 
>>> apparently shows it (ls -l does) but I wouldn't be sure for other, 
>>> old ones (like Debians build machines). The locale this fails under 
>>> definitely is UTF-8 though.
>
> Pre 
> <https://git.libreoffice.org/core/+/fbf025b4903bfcb93c3d4bbf1ebbf860cf11618d%5E%21> 
> "Make testHybridPDFFile Windows-only, and filenames in repo 
> ASCII-only", I can reproduce the failure on Linux when not using an 
> UTF-8 locale but explicitly specifying an e.g. ASCII locale (and thus 
> an osl_getThreadTextEncoding value of RTL_TEXTENCODING_ASCII_US) with 
> `LC_CTYPE=C make -O CppunitTest_filter_textfilterdetect 
> CPPUNIT_TEST_NAME=testHybridPDFFile::TestBody`.         t=`mktemp -q 
> -d`; \


But in my case this fails with

         cd $(SOURCE_TREE) && \
                 export PATH=$(BUILD_PATH); \
                 export TMPDIR=$$t; \
                 export HOME=$$t; \
                 export LOCPATH=$(CURDIR)/debian/locales; \
                 export LANG=en_US.UTF-8; \
                 export TZ=UTC; \
                 unset DISPLAY; \
                 unset CONNECTIVITY_TEST_MYSQL_DRIVER; \
                 export PARALLELISM=1; \
                 if [ -x /usr/bin/gdb ]; then ulimit -c unlimited || 
true; fi && \
                 $(TEST_TIMEOUT) $(MAKE) -k check || $(TEST_TIMEOUT) 
$(MAKE) check && \
         rm -rf $$t

so with a UTF-8 locale. (which is generated before that rule)

> For better or worse, the payload of LO "internal" file URLs is always 
> considered to be a UTF-8 encoding of the actual system pathname. It is 
> *not* a byte-for-byte representation of the bytes that make up the 
> Unix system pathname.
>
> What thus happens here is that the file UCP's TaskManager::getv -> 
> osl::DirectoryItem::get -> osl_getDirectoryItem -> 
> osl::detail::convertUrlToPathname -> getSystemPathFromFileUrl -> 
> decodeFromUtf8 -> convert -> UnicodeToTextConverter_Impl::convert -> 
> rtl_convertUnicodeToText tries to translate the Unicode chars of 
> "hybrid_writer_абв_αβγ.pdf" to osl_getThreadTextEncoding() == 
> RTL_TEXTENCODING_ASCII_US, but which doesn't work because ASCII has no 
> representation of the Cyrillic and Greek letters.

I did some more tests.

In my standard local build environment (cowbuilder[1] --login chroot) it 
fails.

if I chroot() into exactly that same chroot (as it is on disk), it works.

If I use a pbuilder --login chroot it succeeds.


I remember some sal (tmpfile?) tests which exhibited the very same mix 
once, too (which I never reported, and I think even pbuilder --login 
failed), but not in recent LOs.


Regards,


Rene



More information about the LibreOffice mailing list