[Libreoffice] l10n based on PO files

Christian Lohmaier lohmaier+libreoffice at googlemail.com
Mon Mar 14 15:06:51 PDT 2011


Hi Andras,

On Mon, Mar 14, 2011 at 6:28 PM, Andras Timar <timar74 at gmail.com> wrote:
> 2011/3/14 Christian Lohmaier <lohmaier+libreoffice at googlemail.com>:
>> On Mon, Mar 14, 2011 at 2:26 PM, Andras Timar <timar74 at gmail.com> wrote:
>>>>
>>> Exactly. :) Most translators use po files to translate LibreOffice and
>>> storing those po files in git has more advantages.
>>> * Many small files vs. one big file / language -> smaller changesets in git
>>> * English - translated string pairs -> translations can be always in
>>> sync with English
>>
>> This is a huge contradiction in itself, isn't it? Having the english
>> string in each and every file also means that you have to update
>> *every* file when a string changes, thus your "small changesets" don't
>
> I don't understand what you mean. When a string changes for example in
> svx/source/dialog/sdstring.src, then only the small
> svx/source/dialog.po will be changed not the whole SDF file.

But you will have to change that po for /every/ language. So you got
about 100 changes, not just one. (and then you will have another 100
individual changes when the translation is updated).

And you don't change the whole SDF file either. You replace one line
in the SDF with another line. The rest of the SDF stays the same.

>> provide any benefit at all. (not to mention how will be responsible
>> for updating all the files once an English string changes?)
>
> The proposed l10n workflow is the following. I provide en-US.sdf and
> pot files regularly (let say bi-weekly). en-US.sdf and pot files can
> be generated with the makefile.mk of the 'translations' module. I also
> update Pootle. Before release, I get translations from Pootle (or from
> external sources) and commit them to git.

OK, so pootle is the tool used, you don't actually commit the po(t)
files to the repository after each change. If pootle is mediator
anyway, I don't see the reason why the english string should be part
of every po file in the sourcetree.
You can easily do that while processing the files, can't you?
(i.e. only have the id in the file in the sourcetree as original
string, the copy in the sourctree is not meant for
editors/translators, translators instead get the processed version
from pootle or some weekly generated snapshots or similar)

>> (Besides I doubt there is any benefit wrt changesets when using many
>> small files compared to one big file)
>
> Let's imagine I change 1 line in all sdf files (~13 MB each), because
> I change Oracle to The Document Foundation or whatever. It means that
> git will add 104 times 13 Megabytes compressed.

No, that's just not true. You will add 100 times that one line change.

And again whether you change that line 1000 times in 100 files, or 10
times in 10000 files doesn't make a difference - in total that is
100000 changed lines.
Put that into one single commit, one single changeset and cgit will
probably have the same timeout/memory or whatever problems.

If you'd instead commit per language, you got 1000 changes in each
commit, one time all within a single file, and another time split
across in 100 files.

Again not really a big difference, the difference is not caused by
having one large file instead of multiple small ones.

single large files are more uncomfortable for editors/translators, but
then again those don't use those files directly anyway.

> That's the size of the
> changeset. It can be ~100 MB.

I really, really doubt that git is that bad at handling changes. After
all it even does binary-diffing instead of storing a fresh copy of a
binary file each time.
Git surely doesn't store a full new copy with each commit.

> Have you tried to see a log or a diff at
> http://cgit.freedesktop.org/libreoffice/l10n/? It does not work,
> because session is timed out before git provides the result. git is
> bad at handling large files.

But that's a completely different topic than use of storage-space
needed for the commits.

Probably the time for getting the changesets from the repo is that bad
because the individual changesets just are huge.

There's no problem in displaying a regular commit like this one for example
http://cgit.freedesktop.org/libreoffice/l10n/commit/?id=a78af9ff93ee90900ed9adde00397d46412b67aa
update of one single language

All languages are updated at once, and each language updates many
strings, so it is not one line that changed, but thousands.

ciao
Christian


More information about the LibreOffice mailing list