[EasyHack] #44681 port to CLucene from java/Lucene

Radek Doulik rodo at novell.com
Mon Feb 13 07:17:49 PST 2012


Hi Gert,

thanks for looking into this.

On Fri, 2012-02-10 at 23:11 +0100, Gert van Valkenhoef wrote:
> Dear LibreOffice developers,
> 
> Bug: https://bugs.freedesktop.org/show_bug.cgi?id=44681
> 
> Attached are initial implementations of the HelpIndexer and HelpSearch 
> in C++ using CLucene, to replace the Java implementations using Lucene.
> 
> The code that interfaces with Lucene to do the indexing and searching is 
> complete. I have a test set up where I create an index with both the 
> HelpIndexerTool.jar and the C++ indexer, and search it using the C++ 
> searcher. These give identical results. Thus, luckily, the index format 
> is compatible between CLucene and Java Lucene.
> 
> I've also looked into where the HelpIndexerTool is currently used, and 
> found these:
> 
>   - xmlhelp/source/cxxhelp/provider/databases.cxx:
> 
>      * In extension mode (enabled by HelpIndexer), through XInvocation
> 
>      * Does not ZIP the result
> 
>   - helpcontent2/util/target.pmk
> 
>      * Called as a command-line tool
> 
>      * ZIPs the result, but already has an alternative code path to do 
> it (the final .ELSE)
> 
> Based on this, it looks like the Java HelpIndexerTool is a lot more 
> complex than it needs to be, and does a few things that are better 
> handled by other tools. Especially the "extension mode" seems to be a 
> relic of the convoluted code path (through XInvocation etc.) and doesn't 
> do much more than suppressing error messages. In addition, couldn't the 
> ZIP creation just always be replaced by this alternative code path? Its 
> well possible that I missed a few things here.

Not sure, probably best if you try it or maybe someone else who knows
that part will answer.

> If "extension mode" and ZIP archiving are not needed, the implementation 
> is complete, and the remaining work would be integrating with the build 
> process. Here are a couple of caveats and/or questions related to that:
> 
>   * This implementation is using the master branch of CLucene's git, 
> with clucene-contribs-lib enabled (for CJK support). The released 
> version of CLucene is compatible with Lucene 1.9.x, whereas LibreOffice 
> uses Lucene 2.3.
> 
>   * Can someone help to figure out how to make CLucene part of the LO 
> build process? CLucene is using CMake and there seems to be no way to 
> 'make install' the clucene-contribs-lib, so this might be tricky.

This usually done like this, you either use system libraries if
available or build the package (CLucene in this case) inside LO build
tree. Look into configure.in, search for cairo for example. Cairo is
graphic library where we link against system one or build one inside LO.
Giving Cc to _rene_ and pmladek who know a lot about build process.

Cheers
Radek

>   * I'm not sure exactly how to make my code build as part of the LO 
> build, but could probably figure it out as long as the previous point is 
> addressed.
> 
>   * CLucene (like Java) uses wide characters throughout, and defines 
> it's own TCHAR type for that. Can we make this play nice with how LO 
> handles strings?
> 
>   * I'm using some Unix headers, are these available on windows or 
> should I use some kind of LO equivalent of them?
> 
>   * I tried replacing the HelpIndexerTool in 
> helpcontent2/util/target.pmk, which seems to work fine, except that I'm 
> returning an error code when the content/caption directory doesn't exist 
> (unlike HelpIndexerTool), which breaks on "shared".
> 
> I hope this is useful (and not too verbose :-P).
> 
> Best regards,
> 
> Gert van Valkenhoef
> _______________________________________________
> LibreOffice mailing list
> LibreOffice at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/libreoffice




More information about the LibreOffice mailing list