[EasyHack] #44681 port to CLucene from java/Lucene
Gert van Valkenhoef
g.h.m.van.valkenhoef at rug.nl
Tue Feb 14 13:27:30 PST 2012
Hi all,
Thanks for all the replies and comments.
Attached is a new bunch of patches against master. I've reworked the
helpindexer.cpp code so that it can be used as a library, and I changed
xmlhelp/source/cxxhelp/provider/databases.cxx to call it.
The good news is that I think this gets rid of the Java invocation on
startup. The bad news is that this breaks the build, as I explain below.
I attach these work-in-progress patches anyway, because I won't get
around to working on this for a few days at least.
1. I converted the HelpIndexer from C++'s std::string and std::wstring
to rtl::UOString. This created a new problem (HelpIndexer.cxx:106) of
how to convert the rtl::UOString to the TCHAR* that CLucene needs. How
can I convert a UOString to a TCHAR* (wchar_t*) in a way that won't
break platform independence? This currently garbles the "path" field in
the index.
2. In xmlhelp/source/cxxhelp/provider/makefile.mk, I've hacked the
include path to include l10ntools/source/help, probably not too good of
an idea. I also don't know how to link in the HelpIndexer.o file from
xmlhelp (or how to create a .so for it that can be found by xmlhelp).
3. The conversion from using UNIX dirent.h and friends to using 'sal'
still needs to happen, and I think that will help get rid of some
awkward string conversions too.
4. The patch assumes both libclucene-core and libclucene-contribs-lib
are available from pkg-config. Disable the '#define TODO' and the
relevant line in the Makefile to only depend on libclucene-core.
Cheers,
Gert
On 02/14/2012 05:24 PM, Caolán McNamara wrote:
> On Tue, 2012-02-14 at 17:04 +0100, G.H.M.Valkenhoef, van wrote:
>
>> I noticed that CJK-based indexing is only enabled for the Japanese
>> language. Maybe this can be fixed by adding more languages to be
>> CJK-indexed.
> Indeed, opengrok for "CJKAnalyzer" and see if running zh-* (and possibly
> ko) through org.apache.lucene.analysis.cjk.CJKAnalyzer makes a
> difference.
>
> Which sadly might mean we need the clucene version of that too :-)
>
> C.
>
More information about the LibreOffice
mailing list