[EasyHack] #44681 port to CLucene from java/Lucene

Gert van Valkenhoef g.h.m.van.valkenhoef at rug.nl
Tue Feb 14 13:27:30 PST 2012


Hi all,

Thanks for all the replies and comments.

Attached is a new bunch of patches against master. I've reworked the 
helpindexer.cpp code so that it can be used as a library, and I changed 
xmlhelp/source/cxxhelp/provider/databases.cxx to call it.

The good news is that I think this gets rid of the Java invocation on 
startup. The bad news is that this breaks the build, as I explain below. 
I attach these work-in-progress patches anyway, because I won't get 
around to working on this for a few days at least.

1. I converted the HelpIndexer from C++'s std::string and std::wstring 
to rtl::UOString. This created a new problem (HelpIndexer.cxx:106) of 
how to convert the rtl::UOString to the TCHAR* that CLucene needs. How 
can I convert a UOString to a TCHAR* (wchar_t*) in a way that won't 
break platform independence? This currently garbles the "path" field in 
the index.

2. In xmlhelp/source/cxxhelp/provider/makefile.mk, I've hacked the 
include path to include l10ntools/source/help, probably not too good of 
an idea. I also don't know how to link in the HelpIndexer.o file from 
xmlhelp (or how to create a .so for it that can be found by xmlhelp).

3. The conversion from using UNIX dirent.h and friends to using 'sal' 
still needs to happen, and I think that will help get rid of some 
awkward string conversions too.

4. The patch assumes both libclucene-core and libclucene-contribs-lib 
are available from pkg-config. Disable the '#define TODO' and the 
relevant line in the Makefile to only depend on libclucene-core.

Cheers,

Gert

On 02/14/2012 05:24 PM, Caolán McNamara wrote:
> On Tue, 2012-02-14 at 17:04 +0100, G.H.M.Valkenhoef, van wrote:
>
>> I noticed that CJK-based indexing is only enabled for the Japanese
>> language. Maybe this can be fixed by adding more languages to be
>> CJK-indexed.
> Indeed, opengrok for "CJKAnalyzer" and see if running zh-* (and possibly
> ko) through org.apache.lucene.analysis.cjk.CJKAnalyzer makes a
> difference.
>
> Which sadly might mean we need the clucene version of that too :-)
>
> C.
>



More information about the LibreOffice mailing list