Selection of dictionaries per installation

Stephan Bergmann sbergman at redhat.com
Thu Aug 30 04:16:47 PDT 2012


The recent removal of the extension prereg mechanism revealed a problem 
with how we select which dictionaries (which come in the form of bundled 
extensions) are included in a given installation.

At least with the "official" (<http://download.libreoffice.org>) Linux 
and Mac OS X installation sets, the base installation set contains en-US 
localization and only contains dictionaries "related" to that locale 
(dict-en, dict-es, dict-fr; see below for details of what "related" 
means).  The additional per-language langpacks contain dictionaries 
"related" to the given langpack (e.g., langpack_de contains dict-de).

However, on Windows, the base installation set contains all available 
localizations and all available dictionaries.  During msi installation, 
some code apparently determines a default selection of only a subset of 
the "Additional user interface languages" entries (presumably based on 
the current system locale settings), but all of the available "Optional 
Components - Dictionaries" entries are selected by default.  This now 
causes per-user generation of data about all those bundled dictionary 
extensions at per-user first-start of LO, leading to noticeable time and 
space requirements (see 
<https://bugs.freedesktop.org/show_bug.cgi?id=53009> "Large 
UserInstallation's user/extensions/bundled/ tree").

Hence, one suggestion to address that problem would be to reduce the 
amount of "Optional Components - Dictionaries" entries selected by 
default during Windows msi installation, similar to how a certain 
combination of base installation set plus langpack(s) on the other 
platforms also only installs a subset of all the available dictionaries. 
  (That is, the code that apparently now determines a default selection 
of "Additional user interface languages" entries would need to be 
extended to also determine a default selection of "related" "Optional 
Components - Dictionaries" entries.)

Initial reactions on IRC (see below) were that (a) the status quo on 
Windows was to avoid "political issues" (though that would be 
inconsistent with the status quo on the other platforms), and (b) to 
rethink having dictionaries as bundled extensions (though I would prefer 
to keep things simple, solving the problem by harmonizing behavior 
across platforms now and leaving anything more ambitious for the future).

Any further thoughts?

Stephan

PS1: The way dictionaries "related" to a given locale are determined 
appears to be the the list at 
setup_native/source/packinfo/spellchecker_selection.txt.  That's why the 
en-US base installation set for Linux and Mac OS X contains dict-en, 
dict-es, and dict-fr, for example.  However, an apparent inconsistency 
is that langpack_de only contains dict-de, and not also dict-fr and 
dict-it, as that list would suggest.

PS2: At least the Mac OS X LO 3.6.1 en-US base installation set contains 
share/extension/dict-* directories for all available dictionaries, not 
just dict-en, dict-es, dict-fr, but the additional ones are effectively 
empty and their existence is a bug.

PS3: For the record, the relevant log of yesterday's #libreofifice-dev:

> Aug 29 12:50:57 <sberg> timar, do you know anything about our msi by default installing all "Optional Components - Dictionaries" entries, but only selected (at installation time, I presume?) "Additional user interface languages"?
> Aug 29 12:51:59 <timar> sberg: yes, we always install all dictionaries on Windows in order to avoid "political issues"
> Aug 29 12:52:26 <tml_> is this the old "omg, I waste SEVERAL MEGABYTES on dictionaries for languages I don't even like" discussion?
> Aug 29 12:53:41 <sberg> timar, but that causes one part of the problems of fdo#53009, so I had hoped we could fix that
> Aug 29 12:53:44 <IZBot> LibreOffice-Libreoffice normal/medium ASSIGNED Large UserInstallation's user/extensions/bundled/ tree https://bugs.freedesktop.org/show_bug.cgi?id=53009
> Aug 29 12:54:41 <tml_> wouldn't the best solution then be to stop treating these as "extensions"?
> Aug 29 12:55:12 <tml_> don't we have too much optionality in the installer anyway?
> Aug 29 12:55:40 <tml_> hmm, those are orthogonal issues, sorry
> Aug 29 12:58:36 <timar> sberg: what is your suggestion?
> Aug 29 13:02:55 <sberg> timar, assuming that there is code in our msi to default-enable some subset X of "Additional user interface languages" entries: extend that code to also default-enable only a "matching" subset of "Optional Components - Dictionaries" entries
> Aug 29 13:03:44 <tml_> that assumes people would prefer to use software (including the OS) in the same language as they write/edit documents it. not true
> Aug 29 13:03:46 <sberg> ...for some suitable definition of "matching"
> Aug 29 13:05:01 <timar> sberg: tml_ there is http://opengrok.libreoffice.org/xref/core/setup_native/source/packinfo/spellchecker_selection.txt that we still use for creating Linux langpacks IMHO (not sure)
> Aug 29 13:05:11 <sberg> tml_, no, but it might be a better approximation to typical users' needs than the current "install everything" approach (after all, users /can/ install additional dics -- its only about the defaults)
> Aug 29 13:06:45 <sberg> timar, yes, that list I had on my mind
> Aug 29 13:06:56 <tml_> sberg: one person's good approximation is another person's grave insult to the XXX people ;)
> Aug 29 13:07:26 <sberg> tml_, we already use that approximation on other platforms
> Aug 29 13:07:45 <tml_> so that is broken, then? ;)
> Aug 29 13:09:16 <sberg> tml_, do you have a better suggestion?
> Aug 29 13:10:01 <tml_> sberg: is that there are lots of *extensions* that is causing problems, or lots of *dictionaries* ?
> Aug 29 13:11:03 <tml_> or, wait, am I smoking crack with this talk about extensions?
> Aug 29 13:11:25 <tml_> (I somehow had the impression that many dictionaires are technically packaged as "extensions", are they?)
> Aug 29 13:11:51 <timar> tml_: dictionaries are extensions
> Aug 29 13:12:15 <sberg> tml_, dictionaries come as bundled extensions, and every bundled extension increases the per-user space reqs and per-user--first-start time reqs (though some do more than others)
> Aug 29 13:12:20 <tml_> ok, so then the question above to sberg still holds
> Aug 29 13:12:52 <tml_> sberg: ok, so wouldn't the solution then be to stop packaging dictionaries as extensions? or do they *have* to be such for some obscure technical reason?
> Aug 29 13:13:05 <tml_> I mean, they could still be optional in the installer even if they weren't extensions
> Aug 29 13:13:29 <tml_> just like lots of other things are optional but aren't extensions
> Aug 29 13:16:28 <sberg> tml_, I think the origin of having dicts as exts is so that (a) people can install additional ones (OOo traditionally did not come with such a large number of bundled dicts as LO does at least on Windows, IIUC), and (b) people can update dicts independently from updating the app itself (as the dicts were traditionally provided by 3rd parties, IIUC)
> Aug 29 13:17:38 <tml_> but having the bundled ones not be extensions wouldn't stop (a), and (b) is made unnecessary by our time-based frequent releases
> Aug 29 13:22:54 <sberg> tml_, I'm not arguing that having dicts as exts is necessarily good; what I'm not sure about is whether turning a given dict from ext to non-ext could cause technical problems, if a user installed an ext variant of that dict into a LO that contains that dict as non-ext
> Aug 29 13:24:24 <tml_> that is something to check (and fix) then, if the bundled dictionaries would not be extensions any more
> Aug 29 13:24:31 <sberg> maybe makes sense to put this on the ESC agenda
> Aug 29 13:27:11 <caolan> some of the code for the old pre-extension mechanism for dictionaries still exists in lingucomponent/source/lingutil/lingutil.cxx now used for the system dictionary case
> Aug 29 13:27:30 <caolan> its *supposed* to prefer extensions IIRC over system dicts
> Aug 29 13:27:41 <caolan> *shrug*
> Aug 29 13:28:43 <caolan> the removed pre-extension code had a dictionary.lst in some dir or other that listed the dicts and languages they were for
> Aug 29 13:29:47 <caolan> but that was back in pre language tool days, not sure if that makes some of our bundled dicts no longer just simple hunspell/hyphen/mythes containers
> Aug 29 13:30:10 <tml_> sberg: but anyway, I am not opposed to making the installer by default select only a (somewhat arbitrary) subset of dictionaries to install, if that fixes a problem for most people
> Aug 29 13:30:37 <tml_> and even if I was opposed, that could be ignored;)
> Aug 29 13:32:23 <caolan> throw the net wide enough, dict for langpack + top X languages always installed + langs also in use in territory + Y neighbouring langs :-)
> Aug 29 13:36:46 <tml_> caolan: but isn't it so that exactly selecting "neighbouring langs" (but not langs from some country a few borders away) can cause immense irritation. "why would we proud Freedonians want to write in the language of those dogs of Elbonia. what we need is the language of our beloved friends from Bulvania"
> Aug 29 13:37:36 <tml_> but whatever
> Aug 29 13:40:20 <caolan> including Russian in a shortlist of dicts for the Latvian langpack is a potential contender for that problem
> Aug 29 13:41:58 <tml_> which is why when including *all* one can always say "we don't make any judgements"
> Aug 29 13:42:29 <caolan> Bosnian/Serbian/Croatian, *shudder*
> Aug 29 13:45:12 <tml_> caolan: Serbian/Albanian/Russian was the real-world example I had in mind. even if Albanian seems to be a "recognized minority language" in Serbia, so at least officially they couldn't oppose it that heavily
> Aug 29 13:46:33 <tml_> caolan: and what do I know, maybe I am too pessimistic, and only a very small minority of people would take stuff like this so seriously
> Aug 29 13:46:43 <tml_> caolan: after all, it isn't *maps* ;)
> Aug 29 13:47:34 <caolan> tml_: RH has a utility to search for possible maps in software packages :-)


More information about the LibreOffice mailing list