Re: thesaurus.dic Workday 1, background

Alex taosubmarines at
Thu Jun 29 05:28:16 UTC 2023

I have no affiliation with GSoc or any other code program
I am looking at the thesaurus files (as «I am not a coder»), with a view to providing an updated technical.dic thesaurus with many new terms and «a clear upgrade/merge/integrate path from an external data source» (wikipedia).
603 lines of references as opposed to the current 378 lines, each term (possibly) searchable directly on wikipedia .Then maybe I will see about concept/design to integrate (hypothetical) web search in xml  (help, thesaurus) interface. Right click on an item in help or Bayram Cicek’s search interface and you get an option to search the internet, or wikipedia article… (as concept).
Anyway, I deleted the two foreign language thesaurus files hu_AkH11.dic and sl.dic, looking at the bug reports mentioned below, but I quickly realized the dic files were needed when compiling. But blank placeholders (empty text files) are good too.
I have edited code references to remove hu_AkH11.dic, and it compiles OK without even a placeholder (empty) file.
Aside from Linguistic.backup..xcs, line 217, where else is hu_AkH11.dic referenced. I looked in the references below and I believe that it is bloat. I will try compiling HU language support if people think it is useful.
I would expect then that even with HU language and dictionary support installed, the (original) hu_AkH11.dic thesaurus will not exist or be called for. I don’t speak Hungarian though.
sl.dic is integrated to the unit test, I can see. Not touching it today. I will put the text back into the placeholder file for my next build.
«en_US or other language builds get these files unnecessarily, the only task is fixing our packaging.»   OK, how can I help with packaging?
Laszlo do you have a local repo for your lo code, the en_US spelling dictionary? Your language code is different to this specific technical.dic thesaurus, yes?
Alex Tao
Tao Submarines and Systems
Chios, Aegean Sea
>Thursday, June 29, 2023 1:44 AM +03:00 from Németh László < nemeth at >:
>Andras Timar < timar74 at > ezt írta (időpont: 2023. jún. 28., Sze, 17:55):
>>Hi Alex,  
>>On Wed, Jun 28, 2023 at 5:15PM Alex < taosubmarines at > wrote:
>>>Hi everyone
>>>Today I try to determine how to remove two unwanted wordbook files from libreoffice/extras/source/wordbook:
>>>hu_AkH11.dic and sl.dic.
>>>These foreign language (incomplete) dics should be removed, unless they are used in some unit test.
>>>Bug 139961, 68576 etc
>>>Can be removed? OK?
>>I'm not sure, if it's OK. We added these dictionaries for a reason. It's better to ask the maintainers first (I CC-ed them).
>>From the technical point of view, if you remove the files from source, and all references to them, the build should pass. Maybe you need a clean build from scratch. Use "git grep  sl.dic" and "git grep  hu_AkH11.dic" commands, they are more reliable than opengrok.
>You can remove hu_AkH11.dic with the following git command:
>$ git revert 6247c966942a0e43320a234302a67c1f92c2eea7
> Because this was added with that commit:
>  $ git log libreoffice/extras/source/wordbook/hu_AkH11.dic
>commit 6247c966942a0e43320a234302a67c1f92c2eea7
>But these are not unwanted dictionaries, as András wrote.
>In theory, they are packaged only with their language builds, sl-SI and hu-HU. If not, i.e. en_US or other language builds get these files unnecessarily, the only task is fixing our packaging. If the packaging problem is related to some Linux distributions, I believe, our task is only to report that in their bug trackers.
>Is this a GSoC project? I haven't found information about the planned improvement of the (en_US?) thesaurus or the thesaurus code base.
>(By the way, I had an interesting improvement here: English stemming and affixation during thesaurus usage by adding extra language data to the en_US spelling dictionary. Unfortunately, by accident this was removed by the recent maintainer.)
>Best regards,
>>Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the LibreOffice mailing list