(new) non-ASCII filenames break unit tests on Linux
Rene Engelhard
rene at debian.org
Sat Dec 2 14:46:58 UTC 2023
Hi,
After a recent pull of master I get
[build CUT] filter_textfilterdetect
S=/home/rene/LibreOffice/git/master && I=$S/instdir && W=$S/workdir &&
mkdir -p $W/CppunitTest/ && rm -fr
$W/CppunitTest/filter_textfilterdetect.test.user && cp -r $W/unittest
$W/CppunitTest/filter_textfilterdetect.test.user && rm -fr
$W/CppunitTest/filter_textfilterdetect.test.core && mkdir
$W/CppunitTest/filter_textfilterdetect.test.core && cd
$W/CppunitTest/filter_textfilterdetect.test.core && (
MAX_CONCURRENCY=4 MOZILLA_CERTIFICATE_FOLDER=dbm:
SAL_DISABLE_SYNCHRONOUS_PRINTER_DETECTION=1 SAL_USE_VCLPLUGIN=svp
LIBO_LANG=C
LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}"$I/program:$I/program":$W/UnpackedTarball/cppunit/src/cppunit/.libs
$W/LinkTarget/Executable/cppunittester
$W/LinkTarget/CppunitTest/libtest_filter_textfilterdetect.so --headless
"-env:BRAND_BASE_DIR=file://$S/instdir" "-env:BRAND_SHARE_SUBDIR=share"
"-env:BRAND_SHARE_RESOURCE_SUBDIR=program/resource"
"-env:UserInstallation=file://$W/CppunitTest/filter_textfilterdetect.test.user"
"-env:CONFIGURATION_LAYERS=xcsxcu:file://$I/share/registry
xcsxcu:file://$W/unittest/registry-common
xcsxcu:file://$W/unittest/registry-user-ui"
"-env:UNO_TYPES=file://$I/program/types/offapi.rdb
file://$I/program/types.rdb"
"-env:UNO_SERVICES=file://$W/Rdb/ure/services.rdb
file://$W/Rdb/services.rdb" -env:URE_BIN_DIR=file://$I/program
-env:URE_INTERNAL_LIB_DIR=file://$I/program
-env:LO_LIB_DIR=file://$I/program
-env:LO_JAVA_DIR=file://$I/program/classes --protector
$W/LinkTarget/Library/unoexceptionprotector.so unoexceptionprotector
--protector $W/LinkTarget/Library/unobootstrapprotector.so
unobootstrapprotector --protector
$W/LinkTarget/Library/libvclbootstrapprotector.so vclbootstrapprotector
"-env:CPPUNITTESTTARGET=$W/CppunitTest/filter_textfilterdetect.test"
) 2>&1
[_RUN_____] (anonymous namespace)::testEmptyFile::TestBody
(anonymous namespace)::testEmptyFile::TestBody finished in: 708ms
[_RUN_____] (anonymous namespace)::testHybridPDFFile::TestBody
unknown:0:(anonymous namespace)::testHybridPDFFile::TestBody
An uncaught exception of type com.sun.star.lang.IllegalArgumentException
- Unsupported URL
<file:///home/rene/LibreOffice/git/master//filter/qa/unit/data//hybrid_writer_???_???.pdf>:
"type detection failed" at ./framework/source/loadenv/loadenv.cxx:189
(anonymous namespace)::testHybridPDFFile::TestBody finished in: 2ms
[_RUN_____] (anonymous namespace)::testTdf114428::TestBody
(anonymous namespace)::testTdf114428::TestBody finished in: 0ms
##Failure Location unknown## : Error
Test name: (anonymous namespace)::testHybridPDFFile::TestBody
An uncaught exception of type com.sun.star.lang.IllegalArgumentException
- Unsupported URL
<file:///home/rene/LibreOffice/git/master//filter/qa/unit/data//hybrid_writer_???_???.pdf>:
"type detection failed" at ./framework/source/loadenv/loadenv.cxx:189
Failures !!!
Run: 3 Failure total: 1 Failures: 0 Errors: 1
make[4]: ***
[/home/rene/LibreOffice/git/master/solenv/gbuild/CppunitTest.mk:130:
/home/rene/LibreOffice/git/master/workdir/CppunitTest/filter_textfilterdetect.test]
Error 1
make[4]: Leaving directory '/home/rene/LibreOffice/git/master'
make[3]: *** [Makefile:277: build] Error 2
make[3]: Leaving directory '/home/rene/LibreOffice/git/master'
make[2]: *** [/home/rene/LibreOffice/git/master/debian/rules:2381:
check] Error 2
make[2]: Leaving directory '/home/rene/LibreOffice/git/master'
make[1]: *** [/home/rene/LibreOffice/git/master/debian/rules:2270:
debian/stampdir/build-arch] Error 2
make[1]: Leaving directory '/home/rene/LibreOffice/git/master'
make: *** [debian/rules:2250: build] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit
status 2
debuild: fatal error at line 1182:
dpkg-buildpackage -us -uc -ui -b failed
This boils down to
From 8a0015c35f3f137e4f3a80e40616bc078e265a1c Mon Sep 17 00:00:00 2001
From: Mike Kaganski <mike.kaganski at collabora.com>
Date: Fri, 1 Dec 2023 16:43:49 +0300
Subject: Drop allownonascii check from pre-commit checks
Supposedly, at this day and age, it is OK to use non-ascii file names.
Specifically, this is intended to allow such names for bugdocs, which
allows simpler testing of problems with handling those.
An alternative would be to rename bugdocs at runtime; but that still
requires that the target filesystem supports such names, so...
Change-Id: I25da2402f311d59c5777c4cd302147d6965dea5f
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/160217
Tested-by: Jenkins
Reviewed-by: Mike Kaganski <mike.kaganski at collabora.com>
and
From a5a49657dc17609a05dca59a8521fd71d14fe76e Mon Sep 17 00:00:00 2001
From: Mike Kaganski <mike.kaganski at collabora.com>
Date: Fri, 1 Dec 2023 16:48:44 +0300
Subject: tdf#158442: fix opening hybrid PDFs on Windows
Commit 046e9545956d8ad1d69345d6b4a4c0a33714d179 (Try to revert to use
of file_iterator from boost on Windows, 2023-10-31) had introduced a
problem that pdfparse::PDFReader::read couldn't create file_iterator
for files already opened with write access: mmap_file_iterator ctor
on Windows used single FILE_SHARE_READ as dwSharedMode parameter for
CreateFileA WinAPI; and that failed, when the file was already opened
using GENERIC_WRITE in dwDesiredAccess - which happens when opening
stream in TypeDetection::impl_detectTypeFlatAndDeep.
Fix this by patching boosts' mmap_file_iterator constructor to use
FILE_SHARE_READ | FILE_SHARE_WRITE, like we do in osl_openFile.
But there was a pre-existing problem of using char-based CreateFileA
API, which disallows opening any files with names not representable
in current Windows codepage. Such hybrid PDF files would still fail
creation of the file_iterator, and open as PDF.
Fix that by further patching boost to have wstring-based constructors
for file_iterator and mmap_file_iterator on Windows, which would call
CreateFileW.
Change-Id: Ib190bc090636159ade390b3dd120957d06d7b89b
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/160218
Tested-by: Jenkins
Reviewed-by: Mike Kaganski <mike.kaganski at collabora.com>
The latter adding
-rw-r--r-- filter/qa/unit/data/hybrid_calc_абв_αβγ.pdf bin 0 -> 10420 bytes
-rw-r--r-- filter/qa/unit/data/hybrid_impress_абв_αβγ.pdf bin 0 -> 21055
bytes
-rw-r--r-- filter/qa/unit/data/hybrid_writer_абв_αβγ.pdf bin 0 -> 10732
bytes
aka
create mode 100644
"filter/qa/unit/data/hybrid_calc_\320\260\320\261\320\262_\316\261\316\262\316\263.pdf"
create mode 100644
"filter/qa/unit/data/hybrid_impress_\320\260\320\261\320\262_\316\261\316\262\316\263.pdf"
create mode 100644
"filter/qa/unit/data/hybrid_writer_\320\260\320\261\320\262_\316\261\316\262\316\263.pdf"
in gits view. (No idea what this is? UTF-8)?
In any case this is bad. My filesystem (I think from 2020 or so)
apparently shows it (ls -l does) but I wouldn't be sure for other, old
ones (like Debians build machines). The locale this fails under
definitely is UTF-8 though.
Claiming that Windows and Mac OS work may be (I still doubt it working
in older versions, but anyway) but leaving Linux people out here from
unit tests (and at least on amd64 and arm64 all of those *are* run on
package builds. everytime) is *extremely* bad.
Googling tells me that there's no "filesystem encoding" in Linux but
just bytes the application needs to handle. Apparently LibreOffice does not?
And then there's the URL. URLs are defined in RFC 1738 to be ASCII.
Anything else needs to be escaped. (That's why punycode even exists.)
I think this change should be reverted.
Regards,
Rene
More information about the LibreOffice
mailing list