gettext, boost::locale, binary resource format and translations
Caolán McNamara
caolanm at redhat.com
Wed May 31 14:23:27 UTC 2017
I've been experimenting with replacing the resource src/res system with
a more standard gettext style solution.
Here's what I'm thinking...
1. All bitmap resources are moved out of the resource files and just
referenced by name from source. This is done.
2. All strings that are *not* supposed to be translated are moved out
of the resource files and into source. This is mostly done.
3. All .src files go away and the US English (key) source strings
folded into the .hrc as N_("source string")
so we would go from
vcl/inc/svids.hrc
#define SV_RESID_STRING_NOSELECTIONPOSSIBLE 2001
and
vcl/source/src/menu.src
String SV_RESID_STRING_NOSELECTIONPOSSIBLE
{
Text [ en-US ] = "No selection";
};
to
vcl/inc/strings.hrc
#define N_(String) (u8##String)
#define SV_RESID_STRING_NOSELECTIONPOSSIBLE N_("No selection")
So to add a translation string, no need to find a free slot in some
number range, or adding another number at the end of a sequence and
adapting some other "end of range" number to match, or finding that the
list of numbers of out of sequence and there's a collision etc etc.
4. all .ui files go from <interface> to <interface domain="MODULE">
e.g. "vcl" to identify which translations they use
5. ResMgr is dropped in favour of std::locale created by
boost::locale::generator pointed at matching MODULE .mo files.
boost::locale provides a suitably licensed implementation of loading
standard gettext-format .mo files. It's used in e.g. blender. We
currently build the necessary boost library in master.
6. UIConfig translations are folded into the module .mo, so e.g.
UIConfig_cui goes from a l10n target to a normal one. All the
res/lang.zips of UI files go away and there's just one .mo file per
translated module, strings and .ui files share the same solution so the
special translation loading code for .ui files go away.
4 and 6 should let deckard or native gtk3 code load our .uis and match
them against our .mos out of the box
7. translation via Translation::get(hrc-define-key, std::locale)
instead of ResId::toString(hrc-define-key, ResMgr). Typically the
application code remains the same, e.g. SvxResId(KEY) remains as
SvxRedId(KEY) its just that the KEY has changed from a number to a
const char* of the same name and the underlying layer to get the
translations changes.
8. Python can now be translated with its inbuilt gettext support (we
keep the name strings.hrc there to keep finding the .hrc files uniform
via localize in l10ntools to extract strings to translate) so magic
numbers can go away there too.
9. Java and StarBasic components can be translated via the pre-existing
css.resource.StringResourceWithLocation java-like translation mechanism
instead of looking things up by fragile numbers. This is done and was a
useful exercise in finding missing strings for translation, wrong
translation ids and other weirdness in wizards.
10. en-US res files go away, their strings are now the .hrc keys in the
source code. Side effect is that for unit tests dependencies on .res
files aren't necessary afterwards seeing as they depend on now built-in
en-US resources.
11. remaining .res files are replaced by .mo files generated by
gettext's msgfmt directly from the matching .po in the translations git
submodule
12. in res and .ui-lang-zip files, the old scheme inserts copies of the
en-US strings when the destination translation is missing a
translation. In the gettext scheme the translation is just omitted so
"poor" translations shrink dramatically in size
13. Translations can be extracted from .hrc by xgettext with
xgettext --add-comments --keyword=N_
and can be extracted from .ui with gettexts inbuilt support for that.
The .mo files can be generated with msgfmt so various home grown
utilities could be replaced in favor of those.
14. StringArrays turn into native C++ structures in .hrc files visible
to the translation extraction machinery, e.g.
vcl/source/src/print.src
StringArray SV_PRINT_NATIVE_STRINGS
{
ItemList [en-US] =
{
< "Preview"; >;
< "Page number"; >;
< "Number of pages"; >;
< "More"; >;
< "Print selection only"; >;
};
};
vcl/inc/printaccessoryview.hrc
const char* SV_PRINT_NATIVE_STRINGS[] =
{
N_("Preview"),
N_("Page number"),
N_("Number of pages"),
N_("More"),
N_("Print selection only")
};
15. [API CHANGE] remove deprecated binary .res resouce loader related
uno apis
com::sun::star::resource::OfficeResourceLoader
com::sun::star::resource::XResourceBundleLoader
com::sun::star::resource::XResourceBundle
when translating strings via uno apis
com.sun.star.resource.StringResourceWithLocation
can (and should) continue to be used
16. msgctxt gets dropped
In the scheme I'm currently using we go from a .po of...
msgctxt ""
"app.src\n"
"SV_APP_OSVERSION\n"
"string.text"
msgid "OS: "
msgstr "Betriebssystem:"
to
msgid "OS: "
msgstr "Betriebssystem:"
i.e. no msgctxt, I haven't found any place where we seem to need it to
disambiguate translations yet, but its not hard to support it if
necessary
17. size concerns ?
We always bundle en-US resources in our install, these go away and the
binaries increase by a somewhat smaller amount than the size of the
removed en-US res file.
Translations which, either by design (en-GB, en-ZA) or because of
limited translations (brx), share a lot of strings with en-US shrink a
lot.
Translations which are basically full, e.g. de, appear to be around the
same size for a sample large translated module, e.g. cui where the
gettext prototype (uncompressed .mo) and 5-3 .res + .ui translation
zip-compressed file despite the .mo containing the full en-US keys vs
the .res using numerical keys and the .ui using widget_id keys
This is a bit surprising, maybe its because (even zipped) in general
the simple en-US key strings are shorter than the .ui zip widget id key
strings currently used or perhaps the .res format has more overhead
than I thought.
18. Trashing existing translations ?
This is my unknown, if we make a change like this, which is reasonably
small from the coding perspective, do we have a means to mass convert
everything in pootle to make it a trivial change for translators too ?
It's reasonable easy to script changing the .po files in translations
to match the proposed layout, and my prototype does this in order to
reuse the old translation for my experimental builds, but do we have
the means to e.g. reimport those modified translations into pootle ?
Or do we have other means to manipulate the pootle translations to
avoid ~all translations becoming invalid/fuzzy and avoid forcing manual
inspection of practically every string. Ideally I'd like everything
translated to remain translated so there's no angry translator
pitchfork-wielding mob.
More information about the LibreOffice
mailing list