[PATCH libxkbcommon v2 1/2] makekeys: use GNU gperf to generate perfect hashtables

Ran Benita ran234 at gmail.com
Wed Oct 3 01:18:32 PDT 2012


Hi David,

On Tue, Oct 02, 2012 at 07:51:53PM +0200, David Herrmann wrote:
> Instead of using a home-brew hashtable generator, we should instead use
> the gperf program which is known to work.
> 
> This removes the "makekeys" programs and instead replaces it by a file
> that can generate input files for gperf. Gperf then generates hashtables
> for all of these input files and writes them concatenated into
> ks_tables.h which then can be used from keysym.c
> 
> Unfortunately, gperf does not support integer keys but only strings or
> binary data. Therefore, we have to make the keysym->name lookup
> little-endian to avoid errors during cross compilation.
> 
> Signed-off-by: David Herrmann <dh.herrmann at googlemail.com>
> ---

The code is really nice, I have no comments on it.

I noticed though that it really blows up our binary size. Here's size -A
(only the relevant sections, CFLAGS=-O2) of current master:

section                size     addr
.text                110708    10608
.rodata               95684   121344
.data.rel.ro           2192   237728
Total                238568

Here it is after adding the third table to old makekeys, in the v1 patch
you sent:

section                size     addr
.text                110724    10608
.rodata              122340   121376
.data.rel.ro           2192   264416
Total                265240

And here it is with these patches:

section                size     addr
.text                112788    28912
.rodata              716478   141728
.data.rel.ro          55824   879136
Total                933614

So gperf is clearly doing... something here. In the gperf manual they
mention:
    The size of the generate static keyword array can get extremely large
    if the input keyword file is large or if the keywords are quite
    similar. This tends to slow down the compilation of the generated C
    code, and greatly inflates the object code size. If this situation
    occurs, consider using the ‘-S’ option to reduce data size, potentially
    increasing keyword recognition time a negligible amount. Since many C
    compilers cannot correctly generate code for large switch statements it
    is important to qualify the -S option with an appropriate numerical
    argument that controls the number of switch statements generated.

To reduce the size I tried removing %compare-length from the name-to-key
tables (which helped a bit). Also tried using a few %switch numbers (and
thus let gcc create the lookup tables on its own), which reduced it to
about 550000, but then the compilation takes about a minute :)

So to be honest, the hashing that gperf and makekeys do is nice, but I
don't see why we do it anyway, if it complicates thing. For example, I
just took 15 minutes to do it in the obvious way of creating simple
sorted {name, keysym} arrays and doing binary search on them, to replace
the current makekeys code (see attached patch - just a proof of concept
hacked up python script and a couple bsearch's). I don't see any
performance issues, and the size is:

.text                110516    48016
.rodata               66366   158560
.data.rel.ro          39568   241696
Total                283860

With adding the third table it is:

.text                110548    67184
.rodata               80278   177760
.data.rel.ro          58768   278752
Total                336180

So since makekeys is ugly and gperf is a bit excessive, maybe we should
just keep it simple, what do you think?

Ran
-------------- next part --------------
>From 8fb5efb045b7207b010c979cbeae8f8222759961 Mon Sep 17 00:00:00 2001
From: Ran Benita <ran234 at gmail.com>
Date: Wed, 3 Oct 2012 10:09:48 +0200
Subject: [PATCH libxkbcommon] Replace makekeys with python script + binary
 search

Signed-off-by: Ran Benita <ran234 at gmail.com>
---
 Makefile.am          |   9 +-
 configure.ac         |  14 +--
 makekeys.py          |  23 ++++
 makekeys/.gitignore  |   1 -
 makekeys/Makefile.am |  10 --
 makekeys/makekeys.c  | 302 ---------------------------------------------------
 src/keysym.c         |  97 ++++++-----------
 7 files changed, 62 insertions(+), 394 deletions(-)
 create mode 100644 makekeys.py
 delete mode 100644 makekeys/.gitignore
 delete mode 100644 makekeys/Makefile.am
 delete mode 100644 makekeys/makekeys.c

diff --git a/Makefile.am b/Makefile.am
index 26646fb..dfbd8d9 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,7 +1,5 @@
 ACLOCAL_AMFLAGS = -I m4
 
-SUBDIRS = makekeys
-
 pkgconfigdir = $(libdir)/pkgconfig
 pkgconfig_DATA = xkbcommon.pc
 
@@ -92,11 +90,8 @@ src/xkbcomp/parser.c: $(top_builddir)/src/$(am__dirstamp) $(top_builddir)/src/xk
 src/xkbcomp/parser.h: $(top_builddir)/src/$(am__dirstamp) $(top_builddir)/src/xkbcomp/$(am__dirstamp)
 src/xkbcomp/scanner.c: $(top_builddir)/src/$(am__dirstamp) $(top_builddir)/src/xkbcomp/$(am__dirstamp)
 
-src/ks_tables.h: $(top_builddir)/makekeys/makekeys$(EXEEXT)
-	$(AM_V_GEN)$(top_builddir)/makekeys/makekeys $(top_srcdir)/xkbcommon/xkbcommon-keysyms.h > $@
-
-$(top_builddir)/makekeys/makekeys$(EXEEXT): $(top_srcdir)/makekeys/makekeys.c
-	$(MAKE) -C makekeys
+src/ks_tables.h: makekeys.py
+	$(AM_V_GEN)$(PYTHON) $(top_srcdir)/makekeys.py $(top_srcdir)/xkbcommon/xkbcommon-keysyms.h > $@
 
 # Documentation
 
diff --git a/configure.ac b/configure.ac
index df8a99e..570dda0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -61,6 +61,9 @@ if test ! -f "src/xkbcomp/parser.c"; then
       AC_MSG_ERROR([yacc not found - unable to compile src/xkbcomp/parser.y])
    fi
 fi
+AM_PATH_PYTHON([2.6], [], [
+    AC_MSG_ERROR([python not found - unable to run makekeys])
+])
 
 # Checks for library functions.
 AC_CHECK_FUNCS([strcasecmp strncasecmp])
@@ -71,16 +74,6 @@ fi
 
 AC_CHECK_FUNCS([eaccess euidaccess])
 
-# Build native compiler needed for makekeys
-AC_ARG_VAR([CC_FOR_BUILD], [Build native C compiler program])
-if test "x$CC_FOR_BUILD" = x; then
-    if test "$cross_compiling" != no; then
-        AC_PATH_PROGS([CC_FOR_BUILD], [gcc cc], [cc])
-    else
-        CC_FOR_BUILD="$CC"
-    fi
-fi
-
 XORG_TESTSET_CFLAG([CFLAGS], [-fvisibility=hidden])
 
 # Define a configuration option for the XKB config root
@@ -121,7 +114,6 @@ AC_DEFINE_UNQUOTED([DEFAULT_XKB_LAYOUT], ["$DEFAULT_XKB_LAYOUT"],
 
 AC_CONFIG_FILES([
     Makefile
-    makekeys/Makefile
     xkbcommon-uninstalled.pc
     xkbcommon.pc
     doc/Doxyfile
diff --git a/makekeys.py b/makekeys.py
new file mode 100644
index 0000000..4466256
--- /dev/null
+++ b/makekeys.py
@@ -0,0 +1,23 @@
+#!/usr/bin/env python
+
+import re, sys, itertools
+
+pattern = re.compile(r'^#define\s+XKB_KEY_(?P<name>\w+)\s+(?P<value>0x[0-9a-fA-F]+)\s')
+matches = [pattern.match(line) for line in open(sys.argv[1])]
+entries = [(m.group("name"), int(m.group("value"), 16)) for m in matches if m]
+
+print('''struct name_keysym {
+    const char *name;
+    xkb_keysym_t keysym;
+};\n''')
+
+print('static const struct name_keysym name_to_keysym[] = {');
+for (name, _) in sorted(entries, key=lambda e: e[0]):
+    print('    {{ "{name}", XKB_KEY_{name} }},'.format(name=name))
+print('};\n')
+
+# Only keep the first name given to each keysym value. Hey, it works.
+print('static const struct name_keysym keysym_to_name[] = {');
+for (name, _) in (next(g[1]) for g in itertools.groupby(sorted(entries, key=lambda e: e[1]), key=lambda e: e[1])):
+    print('    {{ "{name}", XKB_KEY_{name} }},'.format(name=name))
+print('};')
diff --git a/makekeys/.gitignore b/makekeys/.gitignore
deleted file mode 100644
index 2bdb5e0..0000000
--- a/makekeys/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-makekeys
diff --git a/makekeys/Makefile.am b/makekeys/Makefile.am
deleted file mode 100644
index 5d9a441..0000000
--- a/makekeys/Makefile.am
+++ /dev/null
@@ -1,10 +0,0 @@
-AM_CFLAGS = $(BASE_CFLAGS) -I$(top_srcdir)
-
-# need to use build-native compiler
-
-CC = $(CC_FOR_BUILD)
-CPPFLAGS = $(CPPFLAGS_FOR_BUILD)
-CFLAGS = $(CFLAGS_FOR_BUILD)
-LDFLAGS = $(LDFLAGS_FOR_BUILD)
-noinst_PROGRAMS = makekeys
-
diff --git a/makekeys/makekeys.c b/makekeys/makekeys.c
deleted file mode 100644
index 62d7255..0000000
--- a/makekeys/makekeys.c
+++ /dev/null
@@ -1,302 +0,0 @@
-/*
- *
- * Copyright 1990, 1998  The Open Group
- *
- * Permission to use, copy, modify, distribute, and sell this software and its
- * documentation for any purpose is hereby granted without fee, provided that
- * the above copyright notice appear in all copies and that both that
- * copyright notice and this permission notice appear in supporting
- * documentation.
- *
- * The above copyright notice and this permission notice shall be included
- * in all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
- * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
- * IN NO EVENT SHALL THE OPEN GROUP BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- *
- * Except as contained in this notice, the name of The Open Group shall
- * not be used in advertising or otherwise to promote the sale, use or
- * other dealings in this Software without prior written authorization
- * from The Open Group.
- *
- */
-
-/*
- * Constructs hash tables for xkb_keysym_to_string and
- * xkb_string_from_keysym.
- */
-
-#include "xkbcommon/xkbcommon.h"
-
-#include <inttypes.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-typedef uint32_t Signature;
-
-#define KTNUM 4000
-
-static struct info {
-    char        *name;
-    xkb_keysym_t val;
-} info[KTNUM];
-
-#define MIN_REHASH 15
-#define MATCHES    10
-
-static char tab[KTNUM];
-static unsigned short offsets[KTNUM];
-static unsigned short indexes[KTNUM];
-static xkb_keysym_t values[KTNUM];
-static int ksnum = 0;
-
-static int
-parse_line(const char *buf, char *key, xkb_keysym_t *val, char *prefix)
-{
-    int i;
-    char alias[128];
-
-    /* See if we can catch a straight XK_foo 0x1234-style definition first;
-     * the trickery around tmp is to account for prefices. */
-    i = sscanf(buf, "#define %127s 0x%" SCNx32, key, val);
-    if (i == 2 && strncmp(key, "XKB_KEY_", 8) == 0) {
-        prefix[0] = '\0';
-        memmove(key, key + 8, strlen(key + 8) + 1);
-        return 1;
-    }
-
-    i = sscanf(buf, "#define %127s %127s", key, alias);
-    if (i == 2)
-        fprintf(stderr, "can't parse keysym definition: %s", buf);
-
-    return 0;
-}
-
-int
-main(int argc, char *argv[])
-{
-    FILE *fptr;
-    int max_rehash;
-    Signature sig;
-    int i, j, k, l, z;
-    char *name;
-    char c;
-    int first;
-    int best_max_rehash;
-    int best_z = 0;
-    int num_found;
-    xkb_keysym_t val;
-    char key[128], prefix[128];
-    char buf[1024];
-
-    for (l = 1; l < argc; l++) {
-        fptr = fopen(argv[l], "r");
-        if (!fptr) {
-            fprintf(stderr, "couldn't open %s\n", argv[l]);
-            continue;
-        }
-
-        while (fgets(buf, sizeof(buf), fptr)) {
-            if (!parse_line(buf, key, &val, prefix))
-                continue;
-
-            if (val == XKB_KEY_VoidSymbol)
-                val = 0;
-            if (val > 0x1fffffff) {
-                fprintf(stderr, "ignoring illegal keysym (%s, %" PRIx32 ")\n",
-                        key,
-                        val);
-                continue;
-            }
-
-            name = malloc(strlen(prefix) + strlen(key) + 1);
-            if (!name) {
-                fprintf(stderr, "makekeys: out of memory!\n");
-                exit(1);
-            }
-            sprintf(name, "%s%s", prefix, key);
-            info[ksnum].name = name;
-            info[ksnum].val = val;
-            ksnum++;
-            if (ksnum == KTNUM) {
-                fprintf(stderr, "makekeys: too many keysyms!\n");
-                exit(1);
-            }
-        }
-
-        fclose(fptr);
-    }
-
-    printf("/* This file is generated from keysymdef.h. */\n");
-    printf("/* Do not edit. */\n");
-    printf("\n");
-
-    best_max_rehash = ksnum;
-    num_found = 0;
-    for (z = ksnum; z < KTNUM; z++) {
-        max_rehash = 0;
-        for (name = tab, i = z; --i >= 0; )
-            *name++ = 0;
-        for (i = 0; i < ksnum; i++) {
-            name = info[i].name;
-            sig = 0;
-            while ((c = *name++))
-                sig = (sig << 1) + c;
-            first = j = sig % z;
-            for (k = 0; tab[j]; k++) {
-                j += first + 1;
-                if (j >= z)
-                    j -= z;
-                if (j == first)
-                    goto next1;
-            }
-            tab[j] = 1;
-            if (k > max_rehash)
-                max_rehash = k;
-        }
-        if (max_rehash < MIN_REHASH) {
-            if (max_rehash < best_max_rehash) {
-                best_max_rehash = max_rehash;
-                best_z = z;
-            }
-            num_found++;
-            if (num_found >= MATCHES)
-                break;
-        }
-next1:;
-    }
-
-    z = best_z;
-    printf("#ifndef KS_TABLES_H\n");
-    printf("#define KS_TABLES_H\n\n");
-    printf("static const unsigned char _XkeyTable[] = {\n");
-    if (z == 0) {
-        fprintf(stderr, "makekeys: failed to find small enough hash!\n"
-                "Try increasing KTNUM in makekeys.c\n");
-        exit(1);
-    }
-    printf("0,\n");
-    k = 1;
-    for (i = 0; i < ksnum; i++) {
-        name = info[i].name;
-        sig = 0;
-        while ((c = *name++))
-            sig = (sig << 1) + c;
-        first = j = sig % z;
-        while (offsets[j]) {
-            j += first + 1;
-            if (j >= z)
-                j -= z;
-        }
-        offsets[j] = k;
-        indexes[i] = k;
-        val = info[i].val;
-        printf("0x%.2" PRIx32 ", 0x%.2" PRIx32 ", 0x%.2" PRIx32 ", "
-               "0x%.2" PRIx32 ", 0x%.2" PRIx32 ", 0x%.2" PRIx32 ", ",
-               (sig >> 8) & 0xff, sig & 0xff, (val >> 24) & 0xff,
-               (val >> 16) & 0xff, (val >> 8) & 0xff, val & 0xff);
-        for (name = info[i].name, k += 7; (c = *name++); k++)
-            printf("'%c',", c);
-        printf((i == (ksnum - 1)) ? "0\n" : "0,\n");
-    }
-    printf("};\n");
-    printf("\n");
-    printf("#define KTABLESIZE %d\n", z);
-    printf("#define KMAXHASH %d\n", best_max_rehash + 1);
-    printf("\n");
-    printf("static const unsigned short hashString[KTABLESIZE] = {\n");
-    for (i = 0; i < z; ) {
-        printf("0x%.4x", offsets[i]);
-        i++;
-        if (i == z)
-            break;
-        printf((i & 7) ? ", " : ",\n");
-    }
-    printf("\n");
-    printf("};\n");
-
-    best_max_rehash = ksnum;
-    num_found = 0;
-    for (z = ksnum; z < KTNUM; z++) {
-        max_rehash = 0;
-        for (name = tab, i = z; --i >= 0; )
-            *name++ = 0;
-        for (i = 0; i < ksnum; i++) {
-            val = info[i].val;
-            first = j = val % z;
-            for (k = 0; tab[j]; k++) {
-                if (values[j] == val)
-                    goto skip1;
-                j += first + 1;
-                if (j >= z)
-                    j -= z;
-                if (j == first)
-                    goto next2;
-            }
-            tab[j] = 1;
-            values[j] = val;
-            if (k > max_rehash)
-                max_rehash = k;
-skip1:;
-        }
-        if (max_rehash < MIN_REHASH) {
-            if (max_rehash < best_max_rehash) {
-                best_max_rehash = max_rehash;
-                best_z = z;
-            }
-            num_found++;
-            if (num_found >= MATCHES)
-                break;
-        }
-next2:;
-    }
-
-    z = best_z;
-    if (z == 0) {
-        fprintf(stderr, "makekeys: failed to find small enough hash!\n"
-                "Try increasing KTNUM in makekeys.c\n");
-        exit(1);
-    }
-    for (i = z; --i >= 0; )
-        offsets[i] = 0;
-    for (i = 0; i < ksnum; i++) {
-        val = info[i].val;
-        first = j = val % z;
-        while (offsets[j]) {
-            if (values[j] == val)
-                goto skip2;
-            j += first + 1;
-            if (j >= z)
-                j -= z;
-        }
-        offsets[j] = indexes[i] + 2;
-        values[j] = val;
-skip2:;
-    }
-    printf("\n");
-    printf("#define VTABLESIZE %d\n", z);
-    printf("#define VMAXHASH %d\n", best_max_rehash + 1);
-    printf("\n");
-    printf("static const unsigned short hashKeysym[VTABLESIZE] = {\n");
-    for (i = 0; i < z; ) {
-        printf("0x%.4x", offsets[i]);
-        i++;
-        if (i == z)
-            break;
-        printf((i & 7) ? ", " : ",\n");
-    }
-    printf("\n");
-    printf("};\n");
-    printf("\n#endif /* KS_TABLES_H */\n");
-
-    for (i = 0; i < ksnum; i++)
-        free(info[i].name);
-
-    exit(0);
-}
diff --git a/src/keysym.c b/src/keysym.c
index d659354..61de5ba 100644
--- a/src/keysym.c
+++ b/src/keysym.c
@@ -49,47 +49,43 @@
 
 #include "xkbcommon/xkbcommon.h"
 #include "utils.h"
-#include "ks_tables.h"
 #include "keysym.h"
+#include "ks_tables.h"
+
+static int compare_by_keysym(const void *a, const void *b)
+{
+    const struct name_keysym *key = a, *entry = b;
+    if (key->keysym < entry->keysym)
+        return -1;
+    if (key->keysym > entry->keysym)
+        return 1;
+    return 0;
+}
+
+static int compare_by_name(const void *a, const void *b)
+{
+    const struct name_keysym *key = a, *entry = b;
+    return strcmp(key->name, entry->name);
+}
+
 
 XKB_EXPORT int
 xkb_keysym_get_name(xkb_keysym_t ks, char *buffer, size_t size)
 {
-    int i, n, h, idx;
-    const unsigned char *entry;
-    unsigned char val1, val2, val3, val4;
+    const struct name_keysym search = { .name = NULL, .keysym = ks };
+    const struct name_keysym *entry;
 
     if ((ks & ((unsigned long) ~0x1fffffff)) != 0) {
         snprintf(buffer, size, "Invalid");
         return -1;
     }
 
-    /* Try to find it in our hash table. */
-    if (ks <= 0x1fffffff) {
-        val1 = ks >> 24;
-        val2 = (ks >> 16) & 0xff;
-        val3 = (ks >> 8) & 0xff;
-        val4 = ks & 0xff;
-        i = ks % VTABLESIZE;
-        h = i + 1;
-        n = VMAXHASH;
-
-        while ((idx = hashKeysym[i])) {
-            entry = &_XkeyTable[idx];
-
-            if ((entry[0] == val1) && (entry[1] == val2) &&
-                (entry[2] == val3) && (entry[3] == val4)) {
-                return snprintf(buffer, size, "%s", entry + 4);
-            }
-
-            if (!--n)
-                break;
-
-            i += h;
-            if (i >= VTABLESIZE)
-                i -= VTABLESIZE;
-        }
-    }
+    entry = bsearch(&search, keysym_to_name,
+                    sizeof(keysym_to_name) / sizeof(*keysym_to_name),
+                    sizeof(*keysym_to_name),
+                    compare_by_keysym);
+    if (entry)
+        return snprintf(buffer, size, "%s", entry->name);
 
     if (ks >= 0x01000100 && ks <= 0x0110ffff)
         /* Unnamed Unicode codepoint. */
@@ -102,42 +98,17 @@ xkb_keysym_get_name(xkb_keysym_t ks, char *buffer, size_t size)
 XKB_EXPORT xkb_keysym_t
 xkb_keysym_from_name(const char *s)
 {
-    int i, n, h, c, idx;
-    uint32_t sig = 0;
-    const char *p = s;
+    const struct name_keysym search = { .name = s, .keysym = 0 };
+    const struct name_keysym *entry;
     char *tmp;
-    const unsigned char *entry;
-    unsigned char sig1, sig2;
     xkb_keysym_t val;
 
-    while ((c = *p++))
-        sig = (sig << 1) + c;
-
-    i = sig % KTABLESIZE;
-    h = i + 1;
-    sig1 = (sig >> 8) & 0xff;
-    sig2 = sig & 0xff;
-    n = KMAXHASH;
-
-    while ((idx = hashString[i])) {
-        entry = &_XkeyTable[idx];
-
-        if ((entry[0] == sig1) && (entry[1] == sig2) &&
-            streq(s, (const char *) entry + 6)) {
-            val = (entry[2] << 24) | (entry[3] << 16) |
-                  (entry[4] << 8) | entry[5];
-            if (!val)
-                val = XKB_KEY_VoidSymbol;
-            return val;
-        }
-
-        if (!--n)
-            break;
-
-        i += h;
-        if (i >= KTABLESIZE)
-            i -= KTABLESIZE;
-    }
+    entry = bsearch(&search, name_to_keysym,
+                    sizeof(name_to_keysym) / sizeof(*name_to_keysym),
+                    sizeof(*name_to_keysym),
+                    compare_by_name);
+    if (entry)
+        return entry->keysym;
 
     if (*s == 'U') {
         val = strtoul(&s[1], &tmp, 16);
-- 
1.7.12.2



More information about the wayland-devel mailing list