[EasyHack] #44681 port to CLucene from java/Lucene

Gert van Valkenhoef g.h.m.van.valkenhoef at rug.nl
Fri Feb 10 14:11:31 PST 2012


Dear LibreOffice developers,

Bug: https://bugs.freedesktop.org/show_bug.cgi?id=44681

Attached are initial implementations of the HelpIndexer and HelpSearch 
in C++ using CLucene, to replace the Java implementations using Lucene.

The code that interfaces with Lucene to do the indexing and searching is 
complete. I have a test set up where I create an index with both the 
HelpIndexerTool.jar and the C++ indexer, and search it using the C++ 
searcher. These give identical results. Thus, luckily, the index format 
is compatible between CLucene and Java Lucene.

I've also looked into where the HelpIndexerTool is currently used, and 
found these:

  - xmlhelp/source/cxxhelp/provider/databases.cxx:

     * In extension mode (enabled by HelpIndexer), through XInvocation

     * Does not ZIP the result

  - helpcontent2/util/target.pmk

     * Called as a command-line tool

     * ZIPs the result, but already has an alternative code path to do 
it (the final .ELSE)

Based on this, it looks like the Java HelpIndexerTool is a lot more 
complex than it needs to be, and does a few things that are better 
handled by other tools. Especially the "extension mode" seems to be a 
relic of the convoluted code path (through XInvocation etc.) and doesn't 
do much more than suppressing error messages. In addition, couldn't the 
ZIP creation just always be replaced by this alternative code path? Its 
well possible that I missed a few things here.

If "extension mode" and ZIP archiving are not needed, the implementation 
is complete, and the remaining work would be integrating with the build 
process. Here are a couple of caveats and/or questions related to that:

  * This implementation is using the master branch of CLucene's git, 
with clucene-contribs-lib enabled (for CJK support). The released 
version of CLucene is compatible with Lucene 1.9.x, whereas LibreOffice 
uses Lucene 2.3.

  * Can someone help to figure out how to make CLucene part of the LO 
build process? CLucene is using CMake and there seems to be no way to 
'make install' the clucene-contribs-lib, so this might be tricky.

  * I'm not sure exactly how to make my code build as part of the LO 
build, but could probably figure it out as long as the previous point is 
addressed.

  * CLucene (like Java) uses wide characters throughout, and defines 
it's own TCHAR type for that. Can we make this play nice with how LO 
handles strings?

  * I'm using some Unix headers, are these available on windows or 
should I use some kind of LO equivalent of them?

  * I tried replacing the HelpIndexerTool in 
helpcontent2/util/target.pmk, which seems to work fine, except that I'm 
returning an error code when the content/caption directory doesn't exist 
(unlike HelpIndexerTool), which breaks on "shared".

I hope this is useful (and not too verbose :-P).

Best regards,

Gert van Valkenhoef
-------------- next part --------------
A non-text attachment was scrubbed...
Name: helpindexer.cxx
Type: text/x-c++src
Size: 6467 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120210/ca89b033/attachment.cxx>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: helpsearch.cxx
Type: text/x-c++src
Size: 3652 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120210/ca89b033/attachment-0001.cxx>


More information about the LibreOffice mailing list