(new) non-ASCII filenames break unit tests on Linux

Rene Engelhard rene at debian.org
Sat Dec 2 14:46:58 UTC 2023


Hi,

After a recent pull of master I get

[build CUT] filter_textfilterdetect
S=/home/rene/LibreOffice/git/master && I=$S/instdir && W=$S/workdir && 
mkdir -p $W/CppunitTest/ && rm -fr 
$W/CppunitTest/filter_textfilterdetect.test.user && cp -r $W/unittest 
$W/CppunitTest/filter_textfilterdetect.test.user &&    rm -fr 
$W/CppunitTest/filter_textfilterdetect.test.core && mkdir 
$W/CppunitTest/filter_textfilterdetect.test.core && cd 
$W/CppunitTest/filter_textfilterdetect.test.core && ( 
MAX_CONCURRENCY=4 MOZILLA_CERTIFICATE_FOLDER=dbm: 
SAL_DISABLE_SYNCHRONOUS_PRINTER_DETECTION=1 SAL_USE_VCLPLUGIN=svp 
LIBO_LANG=C 
LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}"$I/program:$I/program":$W/UnpackedTarball/cppunit/src/cppunit/.libs 
          $W/LinkTarget/Executable/cppunittester 
$W/LinkTarget/CppunitTest/libtest_filter_textfilterdetect.so --headless 
"-env:BRAND_BASE_DIR=file://$S/instdir" "-env:BRAND_SHARE_SUBDIR=share" 
"-env:BRAND_SHARE_RESOURCE_SUBDIR=program/resource" 
"-env:UserInstallation=file://$W/CppunitTest/filter_textfilterdetect.test.user" 
   "-env:CONFIGURATION_LAYERS=xcsxcu:file://$I/share/registry 
xcsxcu:file://$W/unittest/registry-common 
xcsxcu:file://$W/unittest/registry-user-ui" 
"-env:UNO_TYPES=file://$I/program/types/offapi.rdb 
file://$I/program/types.rdb" 
"-env:UNO_SERVICES=file://$W/Rdb/ure/services.rdb 
file://$W/Rdb/services.rdb" -env:URE_BIN_DIR=file://$I/program 
-env:URE_INTERNAL_LIB_DIR=file://$I/program 
-env:LO_LIB_DIR=file://$I/program 
-env:LO_JAVA_DIR=file://$I/program/classes --protector 
$W/LinkTarget/Library/unoexceptionprotector.so unoexceptionprotector 
--protector $W/LinkTarget/Library/unobootstrapprotector.so 
unobootstrapprotector   --protector 
$W/LinkTarget/Library/libvclbootstrapprotector.so vclbootstrapprotector 
  "-env:CPPUNITTESTTARGET=$W/CppunitTest/filter_textfilterdetect.test" 
) 2>&1
[_RUN_____] (anonymous namespace)::testEmptyFile::TestBody
(anonymous namespace)::testEmptyFile::TestBody finished in: 708ms
[_RUN_____] (anonymous namespace)::testHybridPDFFile::TestBody
unknown:0:(anonymous namespace)::testHybridPDFFile::TestBody
An uncaught exception of type com.sun.star.lang.IllegalArgumentException
- Unsupported URL 
<file:///home/rene/LibreOffice/git/master//filter/qa/unit/data//hybrid_writer_???_???.pdf>: 
"type detection failed" at ./framework/source/loadenv/loadenv.cxx:189

(anonymous namespace)::testHybridPDFFile::TestBody finished in: 2ms
[_RUN_____] (anonymous namespace)::testTdf114428::TestBody
(anonymous namespace)::testTdf114428::TestBody finished in: 0ms
##Failure Location unknown## : Error
Test name: (anonymous namespace)::testHybridPDFFile::TestBody
An uncaught exception of type com.sun.star.lang.IllegalArgumentException
- Unsupported URL 
<file:///home/rene/LibreOffice/git/master//filter/qa/unit/data//hybrid_writer_???_???.pdf>: 
"type detection failed" at ./framework/source/loadenv/loadenv.cxx:189

Failures !!!
Run: 3   Failure total: 1   Failures: 0   Errors: 1
make[4]: *** 
[/home/rene/LibreOffice/git/master/solenv/gbuild/CppunitTest.mk:130: 
/home/rene/LibreOffice/git/master/workdir/CppunitTest/filter_textfilterdetect.test] 
Error 1
make[4]: Leaving directory '/home/rene/LibreOffice/git/master'
make[3]: *** [Makefile:277: build] Error 2
make[3]: Leaving directory '/home/rene/LibreOffice/git/master'
make[2]: *** [/home/rene/LibreOffice/git/master/debian/rules:2381: 
check] Error 2
make[2]: Leaving directory '/home/rene/LibreOffice/git/master'
make[1]: *** [/home/rene/LibreOffice/git/master/debian/rules:2270: 
debian/stampdir/build-arch] Error 2
make[1]: Leaving directory '/home/rene/LibreOffice/git/master'
make: *** [debian/rules:2250: build] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit 
status 2
debuild: fatal error at line 1182:
dpkg-buildpackage -us -uc -ui -b failed

This boils down to

 From 8a0015c35f3f137e4f3a80e40616bc078e265a1c Mon Sep 17 00:00:00 2001
From: Mike Kaganski <mike.kaganski at collabora.com>
Date: Fri, 1 Dec 2023 16:43:49 +0300
Subject: Drop allownonascii check from pre-commit checks

Supposedly, at this day and age, it is OK to use non-ascii file names.
Specifically, this is intended to allow such names for bugdocs, which
allows simpler testing of problems with handling those.

An alternative would be to rename bugdocs at runtime; but that still
requires that the target filesystem supports such names, so...

Change-Id: I25da2402f311d59c5777c4cd302147d6965dea5f
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/160217
Tested-by: Jenkins
Reviewed-by: Mike Kaganski <mike.kaganski at collabora.com>

and

 From a5a49657dc17609a05dca59a8521fd71d14fe76e Mon Sep 17 00:00:00 2001
From: Mike Kaganski <mike.kaganski at collabora.com>
Date: Fri, 1 Dec 2023 16:48:44 +0300
Subject: tdf#158442: fix opening hybrid PDFs on Windows

Commit 046e9545956d8ad1d69345d6b4a4c0a33714d179 (Try to revert to use
of file_iterator from boost on Windows, 2023-10-31) had introduced a
problem that pdfparse::PDFReader::read couldn't create file_iterator
for files already opened with write access: mmap_file_iterator ctor
on Windows used single FILE_SHARE_READ as dwSharedMode parameter for
CreateFileA WinAPI; and that failed, when the file was already opened
using GENERIC_WRITE in dwDesiredAccess - which happens when opening
stream in TypeDetection::impl_detectTypeFlatAndDeep.

Fix this by patching boosts' mmap_file_iterator constructor to use
FILE_SHARE_READ | FILE_SHARE_WRITE, like we do in osl_openFile.

But there was a pre-existing problem of using char-based CreateFileA
API, which disallows opening any files with names not representable
in current Windows codepage. Such hybrid PDF files would still fail
creation of the file_iterator, and open as PDF.

Fix that by further patching boost to have wstring-based constructors
for file_iterator and mmap_file_iterator on Windows, which would call
CreateFileW.

Change-Id: Ib190bc090636159ade390b3dd120957d06d7b89b
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/160218
Tested-by: Jenkins
Reviewed-by: Mike Kaganski <mike.kaganski at collabora.com>

The latter adding

-rw-r--r--	filter/qa/unit/data/hybrid_calc_абв_αβγ.pdf	bin	0 -> 10420 bytes
-rw-r--r--	filter/qa/unit/data/hybrid_impress_абв_αβγ.pdf	bin	0 -> 21055 
bytes
-rw-r--r--	filter/qa/unit/data/hybrid_writer_абв_αβγ.pdf	bin	0 -> 10732 
bytes

aka

  create mode 100644 
"filter/qa/unit/data/hybrid_calc_\320\260\320\261\320\262_\316\261\316\262\316\263.pdf"
  create mode 100644 
"filter/qa/unit/data/hybrid_impress_\320\260\320\261\320\262_\316\261\316\262\316\263.pdf"
  create mode 100644 
"filter/qa/unit/data/hybrid_writer_\320\260\320\261\320\262_\316\261\316\262\316\263.pdf"

in gits view. (No idea what this is? UTF-8)?

In any case this is bad. My filesystem (I think from 2020 or so) 
apparently shows it (ls -l does) but I wouldn't be sure for other, old 
ones (like Debians build machines). The locale this fails under 
definitely is UTF-8 though.

Claiming that Windows and Mac OS work may be (I still doubt it working 
in older versions, but anyway) but leaving Linux people out here from 
unit tests (and at least on amd64 and arm64 all of those *are* run on 
package builds. everytime) is *extremely* bad.

Googling tells me that there's no "filesystem encoding" in Linux but 
just bytes the application needs to handle. Apparently LibreOffice does not?

And then there's the URL. URLs are defined in RFC 1738 to be ASCII. 
Anything else needs to be escaped. (That's why punycode even exists.)

I think this change should be reverted.

Regards,

Rene


More information about the LibreOffice mailing list