[Libreoffice-commits] core.git: external/clucene

Stephan Bergmann (via logerrit) logerrit at kemper.freedesktop.org
Thu Apr 23 18:37:07 UTC 2020


 external/clucene/UnpackedTarball_clucene.mk         |    1 +
 external/clucene/patches/heap-buffer-overflow.patch |   11 +++++++++++
 2 files changed, 12 insertions(+)

New commits:
commit 92b7e0fd668f580ca573284e8f36794c72ba62df
Author:     Stephan Bergmann <sbergman at redhat.com>
AuthorDate: Thu Apr 23 16:49:17 2020 +0200
Commit:     Stephan Bergmann <sbergman at redhat.com>
CommitDate: Thu Apr 23 20:36:26 2020 +0200

    external/clucene: Avoid heap-buffer-overflow
    
    ...as seen during a --with-lang=ALL build with ASan on Linux:
    
    > [XHC] nlpsolver ja
    > =================================================================
    > ==51396==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000ed00 at pc 0x7fe425640f53 bp 0x7ffd6a0cc900 sp 0x7ffd6a0cc8f8
    > READ of size 4 at 0x62100000ed00 thread T0
    >  #0 in lucene::analysis::cjk::CJKTokenizer::next(lucene::analysis::Token*) at workdir/UnpackedTarball/clucene/src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp:70:19
    >  #1 in lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field*, lucene::analysis::Analyzer*, int) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:901:32
    >  #2 in lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:798:9
    >  #3 in lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:557:24
    >  #4 in lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, lucene::analysis::Analyzer*, lucene::index::Term*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriter.cpp:946:16
    >  #5 in lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriter.cpp:930:10
    >  #6 in lucene::index::IndexWriter::addDocument(lucene::document::Document*, lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/IndexWriter.cpp:681:28
    >  #7 in HelpIndexer::indexDocuments() at helpcompiler/source/HelpIndexer.cxx:66:20
    >  #8 in main at helpcompiler/source/HelpIndexer_main.cxx:79:22
    > 0x62100000ed00 is located 0 bytes to the right of 4096-byte region [0x62100000dd00,0x62100000ed00)
    > allocated by thread T0 here:
    >  #0 in realloc at /data/sbergman/github.com/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    >  #1 in lucene::util::StreamBuffer<wchar_t>::setSize(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_streambuffer.h:114:17
    >  #2 in lucene::util::StreamBuffer<wchar_t>::makeSpace(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_streambuffer.h:150:5
    >  #3 in lucene::util::BufferedStreamImpl<wchar_t>::setMinBufSize(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_bufferedstream.h:69:16
    >  #4 in lucene::util::SimpleInputStreamReader::Internal::JStreamsBuffer::JStreamsBuffer(lucene::util::CLStream<signed char>*, int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/Reader.cpp:375:6
    
    Note that this is not a proper fix, which would need to properly detect
    surrogate pairs split across buffer boundaries.  But for one the comment says
    "however, gunichartables doesn't seem to classify any of the surrogates as
    alpha, so they are skipped anyway", and for another the behavior until now was
    to replace the high surrogate with soemthing that was likely garbage and leave
    the low surrogate at the start of the next buffer (if any) alone, so leaving
    both surrogates alone is likely at least no worse behavior.
    
    Change-Id: Ib6f6f1bc20ef8efe0418bf2e715783c8555068de
    Reviewed-on: https://gerrit.libreoffice.org/c/core/+/92792
    Tested-by: Jenkins
    Reviewed-by: Stephan Bergmann <sbergman at redhat.com>

diff --git a/external/clucene/UnpackedTarball_clucene.mk b/external/clucene/UnpackedTarball_clucene.mk
index a4036d72c0bc..cb6efabd1d5d 100644
--- a/external/clucene/UnpackedTarball_clucene.mk
+++ b/external/clucene/UnpackedTarball_clucene.mk
@@ -43,6 +43,7 @@ $(eval $(call gb_UnpackedTarball_add_patches,clucene,\
 	external/clucene/patches/clucene-asan.patch \
 	external/clucene/patches/clucene-mixes-uptemplate-parameter-msvc-14.patch \
 	external/clucene/patches/ostream-wchar_t.patch \
+	external/clucene/patches/heap-buffer-overflow.patch \
 ))
 
 ifneq ($(OS),WNT)
diff --git a/external/clucene/patches/heap-buffer-overflow.patch b/external/clucene/patches/heap-buffer-overflow.patch
new file mode 100644
index 000000000000..7421db854cfd
--- /dev/null
+++ b/external/clucene/patches/heap-buffer-overflow.patch
@@ -0,0 +1,11 @@
+--- src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp
++++ src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp
+@@ -66,7 +66,7 @@
+ 		//ucs4(c variable). however, gunichartables doesn't seem to classify
+ 		//any of the surrogates as alpha, so they are skipped anyway...
+ 		//so for now we just convert to ucs4 so that we dont corrupt the input.
+-		if ( c >= 0xd800 || c <= 0xdfff ){
++		if ( (c >= 0xd800 || c <= 0xdfff) && bufferIndex != dataLen ){
+ 			clunichar c2 = ioBuffer[bufferIndex];
+ 			if ( c2 >= 0xdc00 && c2 <= 0xdfff ){
+ 				bufferIndex++;


More information about the Libreoffice-commits mailing list