[Libreoffice] [PATCH] speed up localized builds by introducing po2lo

Miklos Vajna vmiklos at frugalware.org
Wed Sep 7 14:50:53 PDT 2011


Hi,

We discussed with Andras that one of the bottleneck of a build with many
languages is the slowness of the po2oo script from translations-toolkit.

I gave it a try to rewrite it as a standalone script, and it seems to be
quite fast here:

$ time po2oo --skipsource -i hu/ -o hu.sdf -l hu -t en-US.sdf > hu.sdf
real    0m32.842s
user    0m29.205s
sys     0m1.439s

$ time ./po2lo --skipsource -i hu/ -o hu.sdf -l hu -t en-US.sdf > hu.sdf

real    0m3.756s
user    0m3.344s
sys     0m0.143s

The localized sdf file differs, but once you sort its contents, it's the
same for me.

Note that this does not remove the dependency on translations-toolkit:
the oo2po script is still used (but that's not invoked during a normal
build).

I'm attaching two patches:

- one for core.git, which adds the new po2lo script (localize was
  already in solenv/bin, so I pot this one there as well).
- one for translations.git to actually use the new script: here I took a
  look at how python is invoked in filters and did the same.

Andras, could you please give it some testing? I didn't want to push it
without your review, just in case I missed something. :)

Thanks,

Miklos
-------------- next part --------------
From 93d0db294db97a5815b4d2f1267826a59b756985 Mon Sep 17 00:00:00 2001
From: Miklos Vajna <vmiklos at frugalware.org>
Date: Wed, 7 Sep 2011 23:39:15 +0200
Subject: [PATCH] Add po2lo tool

---
 solenv/bin/po2lo |  202 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 202 insertions(+), 0 deletions(-)
 create mode 100755 solenv/bin/po2lo

diff --git a/solenv/bin/po2lo b/solenv/bin/po2lo
new file mode 100755
index 0000000..272d6f9
--- /dev/null
+++ b/solenv/bin/po2lo
@@ -0,0 +1,202 @@
+#!/usr/bin/env python
+# Version: MPL 1.1 / GPLv3+ / LGPLv3+
+#
+# The contents of this file are subject to the Mozilla Public License Version
+# 1.1 (the "License"); you may not use this file except in compliance with
+# the License or as specified alternatively below. You may obtain a copy of
+# the License at http://www.mozilla.org/MPL/
+#
+# Software distributed under the License is distributed on an "AS IS" basis,
+# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+# for the specific language governing rights and limitations under the
+# License.
+#
+# The Initial Developer of the Original Code is
+#       Miklos Vajna <vmiklos at frugalware.org>
+# Portions created by the Initial Developer are Copyright (C) 2011 the
+# Initial Developer. All Rights Reserved.
+#
+# Major Contributor(s):
+#
+# For minor contributions see the git repository.
+#
+# Alternatively, the contents of this file may be used under the terms of
+# either the GNU General Public License Version 3 or later (the "GPLv3+"), or
+# the GNU Lesser General Public License Version 3 or later (the "LGPLv3+"),
+# in which case the provisions of the GPLv3+ or the LGPLv3+ are applicable
+# instead of those above.
+
+import getopt, sys, os, re
+
+class Options:
+    """Options of this script."""
+
+    def __init__(self):
+        self.input = None
+        self.output = None
+        self.language = None
+        self.template = None
+
+class Entry:
+    """Represents a single line in an SDF file."""
+
+    def __init__(self, items):
+        self.items = items # list of 15 fields
+        path = self.items[1].split('\\')
+        self.po = "%s/%s/%s.po" % (options.input, self.items[0], "/".join(path[:-1]))
+        prefix = ""
+        if len(self.items[5]):
+            prefix += "%s." % self.items[5]
+        if len(self.items[3]):
+            prefix += "%s." % self.items[3]
+        self.keys = []
+        # 10..13 are translation types
+        for idx in range(10, 14):
+            if len(self.items[idx]):
+                t = {10:'text', 12:'quickhelptext', 13:'title'}[idx]
+                self.keys.append((idx, self.sdf2po("%s#%s.%s%s" % (path[-1], self.items[4], prefix, t))))
+
+    def translate(self, translations):
+        """Translates text in the entry based on translations."""
+
+        self.items[9] = options.language
+        for idx, key in self.keys:
+            try:
+                self.items[idx] = translations.data[(self.po, key)]
+
+                self.items[14] = "2002-02-02 02:02:02"
+            except KeyError:
+                pass
+        self.items[14] = self.items[14].strip()
+
+    def sdf2po(self, s):
+        """Escapes special chars in po key names."""
+
+        return s.translate(normalizetable)
+
+class Template:
+    """Represents a reference template in SDF format."""
+
+    def __init__(self, path):
+        sock = open(path)
+        self.lines = []
+        for line in sock:
+            self.lines.append(Entry(line.split('\t')))
+
+    def translate(self, translations):
+        """Translates entires in the template based on translations."""
+
+        sock = open(options.output, "w")
+        for line in self.lines:
+            line.translate(translations)
+            sock.write("\t".join(line.items)+"\r\n")
+        sock.close()
+
+class Translations:
+    """Represents a set of .po files, containing translations."""
+
+    def __init__(self):
+        key = None
+        self.data = {}
+        for root, dirs, files in os.walk(options.input):
+            for file in files:
+                path = "%s/%s" % (root, file)
+                sock = open(path)
+                buf = []
+                multiline = False
+                fuzzy = False
+                for line in sock:
+                    if line.startswith("#: "):
+                        key = line.strip()[3:]
+                    elif line.startswith("#, fuzzy"):
+                        fuzzy = True
+                    elif line.startswith("msgstr "):
+                        trans = line.strip()[8:-1]
+                        if len(trans):
+                            if fuzzy:
+                                fuzzy = False
+                            else:
+                                self.setdata(path, key, trans)
+                        else:
+                            buf = []
+                            buf.append(trans)
+                            multiline = True
+                    elif multiline and line.startswith('"'):
+                        buf.append(line.strip()[1:-1])
+                    elif multiline and not len(line.strip()) and len("".join(buf)):
+                        if fuzzy:
+                            fuzzy = False
+                        else:
+                            self.setdata(path, key, "".join(buf))
+                        buf = []
+                        multiline = False
+                if multiline and len("".join(buf)) and not fuzzy:
+                    self.setdata(path, key, "".join(buf))
+
+    def setdata(self, path, key, s):
+        """Sets the translation for a given path and key, handling (un)escaping
+        as well."""
+        if key:
+            # unescape the po special chars
+            s = s.replace('\\"', '"')
+            if key.split('#')[0].endswith(".xhp"):
+                s = self.escape_help_text(s)
+            else:
+                s = s.replace('\\\\', '\\')
+            self.data[(path, key)] = s
+
+    def escape_help_text(self, text):
+        """Escapes the help text as it would be in an SDF file."""
+
+        for tag in helptagre.findall(text):
+            # <, >, " are only escaped in <[[:lower:]]> tags. Some HTML tags make it in in
+            # lowercase so those are dealt with. Some LibreOffice help tags are not
+            # escaped.
+            escapethistag = False
+            for escape_tag in ["ahelp", "link", "item", "emph", "defaultinline", "switchinline", "caseinline", "variable", "bookmark_value", "image", "embedvar", "alt"]:
+                if tag.startswith("<%s" % escape_tag) or tag == "</%s>" % escape_tag:
+                    escapethistag = True
+            if tag in ["<br/>", "<help-id-missing/>"]:
+                escapethistag = True
+            if escapethistag:
+                escaped_tag = ("\\<" + tag[1:-1] + "\\>").replace('"', '\\"')
+                text = text.replace(tag, escaped_tag)
+        return text
+
+def main():
+    """Main function of this script."""
+
+    opts, args = getopt.getopt(sys.argv[1:], "si:o:l:t:", ["skipsource", "input=", "output=", "language=", "template="])
+    for opt, arg in opts:
+        if opt in ("-s", "--skipsource"):
+            pass
+        elif opt in ("-i", "--input"):
+            options.input = arg.strip('/')
+        elif opt in ("-o", "--output"):
+            options.output = arg
+        elif opt in ("-l", "--language"):
+            options.language = arg
+        elif opt in ("-t", "--template"):
+            options.template = arg
+    template = Template(options.template)
+    translations = Translations()
+    template.translate(translations)
+
+# used by ecape_help_text
+helptagre = re.compile('''<[/]??[a-z_\-]+?(?:| +[a-z]+?=".*?") *[/]??>''')
+
+options = Options()
+
+# used by sdf2po()
+normalfilenamechars = "/#.0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
+normalizetable = ""
+for i in map(chr, range(256)):
+    if i in normalfilenamechars:
+        normalizetable += i
+    else:
+        normalizetable += "_"
+
+if __name__ == "__main__":
+    main()
+
+# vim:set filetype=python shiftwidth=4 softtabstop=4 expandtab:
-- 
1.7.6

-------------- next part --------------
From b246906288deab65699d3982e2a3b8f65dcdcf56 Mon Sep 17 00:00:00 2001
From: Miklos Vajna <vmiklos at frugalware.org>
Date: Wed, 7 Sep 2011 23:39:18 +0200
Subject: [PATCH] Add po2lo tool

---
 translations/makefile.mk |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/translations/makefile.mk b/translations/makefile.mk
index 1a858c3..b0ada9d 100644
--- a/translations/makefile.mk
+++ b/translations/makefile.mk
@@ -48,15 +48,19 @@ TARGET=translations_merge
 
 .INCLUDE : target.mk
 
+.IF "$(OS_FOR_BUILD)"=="WNT" && "$(SYSTEM_PYTHON)"!="YES"
+PYTHON=$(AUGMENT_LIBRARY_PATH) $(WRAPCMD) $(SOLARBINDIR)/python
+.ELSE
+PYTHON=$(WRAPCMD) python
+.ENDIF
+
 .IF "$(SYSTEM_TRANSLATE_TOOLKIT)" == "YES"
 
 OO2PO=oo2po
-PO2OO=po2oo
 
 .ELSE                   # "$(SYSTEM_TRANSLATE_TOOLKIT)" == "YES"
 
 OO2PO=$(AUGMENT_LIBRARY_PATH) $(WRAPCMD) $(SOLARBINDIR)/oo2po
-PO2OO=$(AUGMENT_LIBRARY_PATH) $(WRAPCMD) $(SOLARBINDIR)/po2oo
 
 TRANSLATE_TOOLKIT_PYTHONPATH=$(SOLARLIBDIR)$/translate_toolkit
 .IF "$(SYSTEM_PYTHON)" == "YES" || "$(OS)" == "MACOSX"
@@ -94,7 +98,7 @@ $(MISC)/sdf-l10n/%.sdf : $(MISC)/sdf-template/en-US.sdf
     sed -e "s/\ten-US\t/\tkid\t/" < $@.tmp > $@
     rm -f $@.tmp
 .ELSE
-    $(PO2OO) --skipsource -i $(PRJ)/source/$(@:b) -t $(MISC)/sdf-template/en-US.sdf -o $@ -l $(@:b)
+    $(PYTHON) $(SOLARSRC)/solenv/bin/po2lo --skipsource -i $(PRJ)/source/$(@:b) -t $(MISC)/sdf-template/en-US.sdf -o $@ -l $(@:b)
 .ENDIF
 
 $(MISC)/merge.done : $(foreach,i,$(all_languages) $(MISC)/sdf-l10n/$i.sdf)
-- 
1.7.6

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20110907/7818520e/attachment-0001.pgp>


More information about the LibreOffice mailing list