[PATCH libxkbcommon v2 1/2] makekeys: use GNU gperf to generate perfect hashtables

David Herrmann dh.herrmann at googlemail.com
Thu Oct 4 07:58:50 PDT 2012


Hi Ran

On Wed, Oct 3, 2012 at 10:18 AM, Ran Benita <ran234 at gmail.com> wrote:

[snip]

> So since makekeys is ugly and gperf is a bit excessive, maybe we should
> just keep it simple, what do you think?

Indeed. Thanks a lot for reviewing though. Just some comments on your
patch below:

> From 8fb5efb045b7207b010c979cbeae8f8222759961 Mon Sep 17 00:00:00 2001
> From: Ran Benita <ran234 at gmail.com>
> Date: Wed, 3 Oct 2012 10:09:48 +0200
> Subject: [PATCH libxkbcommon] Replace makekeys with python script + binary
>  search

Great empty commit-message ;) Maybe mention that we used to have a
home-brew hash-table (old makekeys) and gperf is not worth it as it
generates huge tables. So we just go with bsearch here. If anyone
wants to change that, they should provide proper benchmarks first.

> Signed-off-by: Ran Benita <ran234 at gmail.com>
> ---
>  Makefile.am          |   9 +-
>  configure.ac         |  14 +--
>  makekeys.py          |  23 ++++
>  makekeys/.gitignore  |   1 -
>  makekeys/Makefile.am |  10 --
>  makekeys/makekeys.c  | 302 ---------------------------------------------------
>  src/keysym.c         |  97 ++++++-----------
>  7 files changed, 62 insertions(+), 394 deletions(-)
>  create mode 100644 makekeys.py
>  delete mode 100644 makekeys/.gitignore
>  delete mode 100644 makekeys/Makefile.am
>  delete mode 100644 makekeys/makekeys.c
>
> diff --git a/Makefile.am b/Makefile.am
> index 26646fb..dfbd8d9 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -1,7 +1,5 @@
>  ACLOCAL_AMFLAGS = -I m4
>
> -SUBDIRS = makekeys
> -
>  pkgconfigdir = $(libdir)/pkgconfig
>  pkgconfig_DATA = xkbcommon.pc
>
> @@ -92,11 +90,8 @@ src/xkbcomp/parser.c:
> $(top_builddir)/src/$(am__dirstamp) $(top_builddir)/src/xk
>  src/xkbcomp/parser.h: $(top_builddir)/src/$(am__dirstamp)
> $(top_builddir)/src/xkbcomp/$(am__dirstamp)
>  src/xkbcomp/scanner.c: $(top_builddir)/src/$(am__dirstamp)
> $(top_builddir)/src/xkbcomp/$(am__dirstamp)
>
> -src/ks_tables.h: $(top_builddir)/makekeys/makekeys$(EXEEXT)
> -       $(AM_V_GEN)$(top_builddir)/makekeys/makekeys
> $(top_srcdir)/xkbcommon/xkbcommon-keysyms.h > $@
> -
> -$(top_builddir)/makekeys/makekeys$(EXEEXT): $(top_srcdir)/makekeys/makekeys.c
> -       $(MAKE) -C makekeys
> +src/ks_tables.h: makekeys.py
> +       $(AM_V_GEN)$(PYTHON) $(top_srcdir)/makekeys.py
> $(top_srcdir)/xkbcommon/xkbcommon-keysyms.h > $@
>
>  # Documentation
>
> diff --git a/configure.ac b/configure.ac
> index df8a99e..570dda0 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -61,6 +61,9 @@ if test ! -f "src/xkbcomp/parser.c"; then
>        AC_MSG_ERROR([yacc not found - unable to compile src/xkbcomp/parser.y])
>     fi
>  fi
> +AM_PATH_PYTHON([2.6], [], [
> +    AC_MSG_ERROR([python not found - unable to run makekeys])
> +])
>
>  # Checks for library functions.
>  AC_CHECK_FUNCS([strcasecmp strncasecmp])
> @@ -71,16 +74,6 @@ fi
>
>  AC_CHECK_FUNCS([eaccess euidaccess])
>
> -# Build native compiler needed for makekeys
> -AC_ARG_VAR([CC_FOR_BUILD], [Build native C compiler program])
> -if test "x$CC_FOR_BUILD" = x; then
> -    if test "$cross_compiling" != no; then
> -        AC_PATH_PROGS([CC_FOR_BUILD], [gcc cc], [cc])
> -    else
> -        CC_FOR_BUILD="$CC"
> -    fi
> -fi
> -
>  XORG_TESTSET_CFLAG([CFLAGS], [-fvisibility=hidden])
>
>  # Define a configuration option for the XKB config root
> @@ -121,7 +114,6 @@ AC_DEFINE_UNQUOTED([DEFAULT_XKB_LAYOUT],
> ["$DEFAULT_XKB_LAYOUT"],
>
>  AC_CONFIG_FILES([
>      Makefile
> -    makekeys/Makefile
>      xkbcommon-uninstalled.pc
>      xkbcommon.pc
>      doc/Doxyfile
> diff --git a/makekeys.py b/makekeys.py
> new file mode 100644
> index 0000000..4466256
> --- /dev/null
> +++ b/makekeys.py
> @@ -0,0 +1,23 @@
> +#!/usr/bin/env python
> +
> +import re, sys, itertools
> +
> +pattern = re.compile(r'^#define\s+XKB_KEY_(?P<name>\w+)\s+(?P<value>0x[0-9a-fA-F]+)\s')
> +matches = [pattern.match(line) for line in open(sys.argv[1])]
> +entries = [(m.group("name"), int(m.group("value"), 16)) for m in matches if m]
> +
> +print('''struct name_keysym {
> +    const char *name;
> +    xkb_keysym_t keysym;
> +};\n''')
> +
> +print('static const struct name_keysym name_to_keysym[] = {');
> +for (name, _) in sorted(entries, key=lambda e: e[0]):
> +    print('    {{ "{name}", XKB_KEY_{name} }},'.format(name=name))
> +print('};\n')
> +
> +# Only keep the first name given to each keysym value. Hey, it works.

Yeah, that's how it always worked. Just ignore all duplicates and use
the first keysym.

> +print('static const struct name_keysym keysym_to_name[] = {');
> +for (name, _) in (next(g[1]) for g in
> itertools.groupby(sorted(entries, key=lambda e: e[1]), key=lambda e:
> e[1])):
> +    print('    {{ "{name}", XKB_KEY_{name} }},'.format(name=name))
> +print('};')

Unfortunately, I don't know python so I cannot comment on this at all.
But it looks all reasonable and works for me. Good thing is, I can use
the exact same arrays for case-insensitive search, too. That saves a
lot of space.

> diff --git a/makekeys/.gitignore b/makekeys/.gitignore
> deleted file mode 100644
> index 2bdb5e0..0000000
> --- a/makekeys/.gitignore
> +++ /dev/null
> @@ -1 +0,0 @@
> -makekeys
> diff --git a/makekeys/Makefile.am b/makekeys/Makefile.am
> deleted file mode 100644
> index 5d9a441..0000000
> --- a/makekeys/Makefile.am
> +++ /dev/null
> @@ -1,10 +0,0 @@
> -AM_CFLAGS = $(BASE_CFLAGS) -I$(top_srcdir)
> -
> -# need to use build-native compiler
> -
> -CC = $(CC_FOR_BUILD)
> -CPPFLAGS = $(CPPFLAGS_FOR_BUILD)
> -CFLAGS = $(CFLAGS_FOR_BUILD)
> -LDFLAGS = $(LDFLAGS_FOR_BUILD)
> -noinst_PROGRAMS = makekeys
> -
> diff --git a/makekeys/makekeys.c b/makekeys/makekeys.c
> deleted file mode 100644
> index 62d7255..0000000
> --- a/makekeys/makekeys.c
> +++ /dev/null
> @@ -1,302 +0,0 @@
> -/*
> - *
> - * Copyright 1990, 1998  The Open Group
> - *
> - * Permission to use, copy, modify, distribute, and sell this software and its
> - * documentation for any purpose is hereby granted without fee, provided that
> - * the above copyright notice appear in all copies and that both that
> - * copyright notice and this permission notice appear in supporting
> - * documentation.
> - *
> - * The above copyright notice and this permission notice shall be included
> - * in all copies or substantial portions of the Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> - * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> - * IN NO EVENT SHALL THE OPEN GROUP BE LIABLE FOR ANY CLAIM, DAMAGES OR
> - * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> - * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> - * OTHER DEALINGS IN THE SOFTWARE.
> - *
> - * Except as contained in this notice, the name of The Open Group shall
> - * not be used in advertising or otherwise to promote the sale, use or
> - * other dealings in this Software without prior written authorization
> - * from The Open Group.
> - *
> - */
> -
> -/*
> - * Constructs hash tables for xkb_keysym_to_string and
> - * xkb_string_from_keysym.
> - */
> -
> -#include "xkbcommon/xkbcommon.h"
> -
> -#include <inttypes.h>
> -#include <stdio.h>
> -#include <stdlib.h>
> -#include <string.h>
> -
> -typedef uint32_t Signature;
> -
> -#define KTNUM 4000
> -
> -static struct info {
> -    char        *name;
> -    xkb_keysym_t val;
> -} info[KTNUM];
> -
> -#define MIN_REHASH 15
> -#define MATCHES    10
> -
> -static char tab[KTNUM];
> -static unsigned short offsets[KTNUM];
> -static unsigned short indexes[KTNUM];
> -static xkb_keysym_t values[KTNUM];
> -static int ksnum = 0;
> -
> -static int
> -parse_line(const char *buf, char *key, xkb_keysym_t *val, char *prefix)
> -{
> -    int i;
> -    char alias[128];
> -
> -    /* See if we can catch a straight XK_foo 0x1234-style definition first;
> -     * the trickery around tmp is to account for prefices. */
> -    i = sscanf(buf, "#define %127s 0x%" SCNx32, key, val);
> -    if (i == 2 && strncmp(key, "XKB_KEY_", 8) == 0) {
> -        prefix[0] = '\0';
> -        memmove(key, key + 8, strlen(key + 8) + 1);
> -        return 1;
> -    }
> -
> -    i = sscanf(buf, "#define %127s %127s", key, alias);
> -    if (i == 2)
> -        fprintf(stderr, "can't parse keysym definition: %s", buf);
> -
> -    return 0;
> -}
> -
> -int
> -main(int argc, char *argv[])
> -{
> -    FILE *fptr;
> -    int max_rehash;
> -    Signature sig;
> -    int i, j, k, l, z;
> -    char *name;
> -    char c;
> -    int first;
> -    int best_max_rehash;
> -    int best_z = 0;
> -    int num_found;
> -    xkb_keysym_t val;
> -    char key[128], prefix[128];
> -    char buf[1024];
> -
> -    for (l = 1; l < argc; l++) {
> -        fptr = fopen(argv[l], "r");
> -        if (!fptr) {
> -            fprintf(stderr, "couldn't open %s\n", argv[l]);
> -            continue;
> -        }
> -
> -        while (fgets(buf, sizeof(buf), fptr)) {
> -            if (!parse_line(buf, key, &val, prefix))
> -                continue;
> -
> -            if (val == XKB_KEY_VoidSymbol)
> -                val = 0;
> -            if (val > 0x1fffffff) {
> -                fprintf(stderr, "ignoring illegal keysym (%s, %" PRIx32 ")\n",
> -                        key,
> -                        val);
> -                continue;
> -            }
> -
> -            name = malloc(strlen(prefix) + strlen(key) + 1);
> -            if (!name) {
> -                fprintf(stderr, "makekeys: out of memory!\n");
> -                exit(1);
> -            }
> -            sprintf(name, "%s%s", prefix, key);
> -            info[ksnum].name = name;
> -            info[ksnum].val = val;
> -            ksnum++;
> -            if (ksnum == KTNUM) {
> -                fprintf(stderr, "makekeys: too many keysyms!\n");
> -                exit(1);
> -            }
> -        }
> -
> -        fclose(fptr);
> -    }
> -
> -    printf("/* This file is generated from keysymdef.h. */\n");
> -    printf("/* Do not edit. */\n");
> -    printf("\n");
> -
> -    best_max_rehash = ksnum;
> -    num_found = 0;
> -    for (z = ksnum; z < KTNUM; z++) {
> -        max_rehash = 0;
> -        for (name = tab, i = z; --i >= 0; )
> -            *name++ = 0;
> -        for (i = 0; i < ksnum; i++) {
> -            name = info[i].name;
> -            sig = 0;
> -            while ((c = *name++))
> -                sig = (sig << 1) + c;
> -            first = j = sig % z;
> -            for (k = 0; tab[j]; k++) {
> -                j += first + 1;
> -                if (j >= z)
> -                    j -= z;
> -                if (j == first)
> -                    goto next1;
> -            }
> -            tab[j] = 1;
> -            if (k > max_rehash)
> -                max_rehash = k;
> -        }
> -        if (max_rehash < MIN_REHASH) {
> -            if (max_rehash < best_max_rehash) {
> -                best_max_rehash = max_rehash;
> -                best_z = z;
> -            }
> -            num_found++;
> -            if (num_found >= MATCHES)
> -                break;
> -        }
> -next1:;
> -    }
> -
> -    z = best_z;
> -    printf("#ifndef KS_TABLES_H\n");
> -    printf("#define KS_TABLES_H\n\n");
> -    printf("static const unsigned char _XkeyTable[] = {\n");
> -    if (z == 0) {
> -        fprintf(stderr, "makekeys: failed to find small enough hash!\n"
> -                "Try increasing KTNUM in makekeys.c\n");
> -        exit(1);
> -    }
> -    printf("0,\n");
> -    k = 1;
> -    for (i = 0; i < ksnum; i++) {
> -        name = info[i].name;
> -        sig = 0;
> -        while ((c = *name++))
> -            sig = (sig << 1) + c;
> -        first = j = sig % z;
> -        while (offsets[j]) {
> -            j += first + 1;
> -            if (j >= z)
> -                j -= z;
> -        }
> -        offsets[j] = k;
> -        indexes[i] = k;
> -        val = info[i].val;
> -        printf("0x%.2" PRIx32 ", 0x%.2" PRIx32 ", 0x%.2" PRIx32 ", "
> -               "0x%.2" PRIx32 ", 0x%.2" PRIx32 ", 0x%.2" PRIx32 ", ",
> -               (sig >> 8) & 0xff, sig & 0xff, (val >> 24) & 0xff,
> -               (val >> 16) & 0xff, (val >> 8) & 0xff, val & 0xff);
> -        for (name = info[i].name, k += 7; (c = *name++); k++)
> -            printf("'%c',", c);
> -        printf((i == (ksnum - 1)) ? "0\n" : "0,\n");
> -    }
> -    printf("};\n");
> -    printf("\n");
> -    printf("#define KTABLESIZE %d\n", z);
> -    printf("#define KMAXHASH %d\n", best_max_rehash + 1);
> -    printf("\n");
> -    printf("static const unsigned short hashString[KTABLESIZE] = {\n");
> -    for (i = 0; i < z; ) {
> -        printf("0x%.4x", offsets[i]);
> -        i++;
> -        if (i == z)
> -            break;
> -        printf((i & 7) ? ", " : ",\n");
> -    }
> -    printf("\n");
> -    printf("};\n");
> -
> -    best_max_rehash = ksnum;
> -    num_found = 0;
> -    for (z = ksnum; z < KTNUM; z++) {
> -        max_rehash = 0;
> -        for (name = tab, i = z; --i >= 0; )
> -            *name++ = 0;
> -        for (i = 0; i < ksnum; i++) {
> -            val = info[i].val;
> -            first = j = val % z;
> -            for (k = 0; tab[j]; k++) {
> -                if (values[j] == val)
> -                    goto skip1;
> -                j += first + 1;
> -                if (j >= z)
> -                    j -= z;
> -                if (j == first)
> -                    goto next2;
> -            }
> -            tab[j] = 1;
> -            values[j] = val;
> -            if (k > max_rehash)
> -                max_rehash = k;
> -skip1:;
> -        }
> -        if (max_rehash < MIN_REHASH) {
> -            if (max_rehash < best_max_rehash) {
> -                best_max_rehash = max_rehash;
> -                best_z = z;
> -            }
> -            num_found++;
> -            if (num_found >= MATCHES)
> -                break;
> -        }
> -next2:;
> -    }
> -
> -    z = best_z;
> -    if (z == 0) {
> -        fprintf(stderr, "makekeys: failed to find small enough hash!\n"
> -                "Try increasing KTNUM in makekeys.c\n");
> -        exit(1);
> -    }
> -    for (i = z; --i >= 0; )
> -        offsets[i] = 0;
> -    for (i = 0; i < ksnum; i++) {
> -        val = info[i].val;
> -        first = j = val % z;
> -        while (offsets[j]) {
> -            if (values[j] == val)
> -                goto skip2;
> -            j += first + 1;
> -            if (j >= z)
> -                j -= z;
> -        }
> -        offsets[j] = indexes[i] + 2;
> -        values[j] = val;
> -skip2:;
> -    }
> -    printf("\n");
> -    printf("#define VTABLESIZE %d\n", z);
> -    printf("#define VMAXHASH %d\n", best_max_rehash + 1);
> -    printf("\n");
> -    printf("static const unsigned short hashKeysym[VTABLESIZE] = {\n");
> -    for (i = 0; i < z; ) {
> -        printf("0x%.4x", offsets[i]);
> -        i++;
> -        if (i == z)
> -            break;
> -        printf((i & 7) ? ", " : ",\n");
> -    }
> -    printf("\n");
> -    printf("};\n");
> -    printf("\n#endif /* KS_TABLES_H */\n");
> -
> -    for (i = 0; i < ksnum; i++)
> -        free(info[i].name);
> -
> -    exit(0);
> -}
> diff --git a/src/keysym.c b/src/keysym.c
> index d659354..61de5ba 100644
> --- a/src/keysym.c
> +++ b/src/keysym.c
> @@ -49,47 +49,43 @@
>
>  #include "xkbcommon/xkbcommon.h"
>  #include "utils.h"
> -#include "ks_tables.h"
>  #include "keysym.h"
> +#include "ks_tables.h"

I would include <stdlib.h> explicitely if you use bsearch() below.

> +
> +static int compare_by_keysym(const void *a, const void *b)
> +{
> +    const struct name_keysym *key = a, *entry = b;
> +    if (key->keysym < entry->keysym)
> +        return -1;
> +    if (key->keysym > entry->keysym)
> +        return 1;
> +    return 0;
> +}

Drop that and use (bsearch just require <0,0,>0, it doesn't require -1,0,1):
  return key->keysym - (int32_t)entry->keysym;

> +
> +static int compare_by_name(const void *a, const void *b)
> +{
> +    const struct name_keysym *key = a, *entry = b;
> +    return strcmp(key->name, entry->name);
> +}
> +
>
>  XKB_EXPORT int
>  xkb_keysym_get_name(xkb_keysym_t ks, char *buffer, size_t size)
>  {
> -    int i, n, h, idx;
> -    const unsigned char *entry;
> -    unsigned char val1, val2, val3, val4;
> +    const struct name_keysym search = { .name = NULL, .keysym = ks };
> +    const struct name_keysym *entry;
>
>      if ((ks & ((unsigned long) ~0x1fffffff)) != 0) {
>          snprintf(buffer, size, "Invalid");
>          return -1;
>      }
>
> -    /* Try to find it in our hash table. */
> -    if (ks <= 0x1fffffff) {
> -        val1 = ks >> 24;
> -        val2 = (ks >> 16) & 0xff;
> -        val3 = (ks >> 8) & 0xff;
> -        val4 = ks & 0xff;
> -        i = ks % VTABLESIZE;
> -        h = i + 1;
> -        n = VMAXHASH;
> -
> -        while ((idx = hashKeysym[i])) {
> -            entry = &_XkeyTable[idx];
> -
> -            if ((entry[0] == val1) && (entry[1] == val2) &&
> -                (entry[2] == val3) && (entry[3] == val4)) {
> -                return snprintf(buffer, size, "%s", entry + 4);
> -            }
> -
> -            if (!--n)
> -                break;
> -
> -            i += h;
> -            if (i >= VTABLESIZE)
> -                i -= VTABLESIZE;
> -        }
> -    }
> +    entry = bsearch(&search, keysym_to_name,
> +                    sizeof(keysym_to_name) / sizeof(*keysym_to_name),
> +                    sizeof(*keysym_to_name),
> +                    compare_by_keysym);
> +    if (entry)
> +        return snprintf(buffer, size, "%s", entry->name);
>
>      if (ks >= 0x01000100 && ks <= 0x0110ffff)
>          /* Unnamed Unicode codepoint. */
> @@ -102,42 +98,17 @@ xkb_keysym_get_name(xkb_keysym_t ks, char
> *buffer, size_t size)
>  XKB_EXPORT xkb_keysym_t
>  xkb_keysym_from_name(const char *s)
>  {
> -    int i, n, h, c, idx;
> -    uint32_t sig = 0;
> -    const char *p = s;
> +    const struct name_keysym search = { .name = s, .keysym = 0 };
> +    const struct name_keysym *entry;
>      char *tmp;
> -    const unsigned char *entry;
> -    unsigned char sig1, sig2;
>      xkb_keysym_t val;
>
> -    while ((c = *p++))
> -        sig = (sig << 1) + c;
> -
> -    i = sig % KTABLESIZE;
> -    h = i + 1;
> -    sig1 = (sig >> 8) & 0xff;
> -    sig2 = sig & 0xff;
> -    n = KMAXHASH;
> -
> -    while ((idx = hashString[i])) {
> -        entry = &_XkeyTable[idx];
> -
> -        if ((entry[0] == sig1) && (entry[1] == sig2) &&
> -            streq(s, (const char *) entry + 6)) {
> -            val = (entry[2] << 24) | (entry[3] << 16) |
> -                  (entry[4] << 8) | entry[5];
> -            if (!val)
> -                val = XKB_KEY_VoidSymbol;
> -            return val;
> -        }
> -
> -        if (!--n)
> -            break;
> -
> -        i += h;
> -        if (i >= KTABLESIZE)
> -            i -= KTABLESIZE;
> -    }
> +    entry = bsearch(&search, name_to_keysym,
> +                    sizeof(name_to_keysym) / sizeof(*name_to_keysym),
> +                    sizeof(*name_to_keysym),
> +                    compare_by_name);
> +    if (entry)
> +        return entry->keysym;
>
>      if (*s == 'U') {
>          val = strtoul(&s[1], &tmp, 16);

Works all very nice here! Thanks, I am all in favor of this approach.
Reviewed-by: David Herrmann <dh.herrmann at googlemail.com>
No need to resend for the small fixes I suggested, though.

I am working on a case-insensitive search based on this patch. I will
send it when done. The approach I am using is:
Use bsearch with strcasecmp() to find _one_ of the possible solutions
(not necessarily the first). Then I check the entries _before_ and
_after_ the one I found whether they are equal and then I use some
heuristic to return the lower-case keysym (that is, KEY_a, not KEY_A).

Thanks!
David


More information about the wayland-devel mailing list