[PATCH libxkbcommon 1/2] makekeys: add all symbols converted to lower-case

Tue Oct 2 00:37:38 PDT 2012

Hi David,

On Mon, Oct 01, 2012 at 07:29:58PM +0200, David Herrmann wrote:
> xkb_keysym_from_name() uses a big lookup table generated by "makekeys"
> to find keysyms. It does this case-sensitive because we have keys like
> XKB_KEY_A and XKB_KEY_a. So if a user searches for "a" we must always
> return the case-sensitive match, which is XKB_KEY_a.
> However, lets look at XKB_KEY_XF86HomePage. The user currently _must_
> enter "XF86HomePage" correctly to get this keysym. If they enter
> "Xf86HomePage" or "XF86Homepage" or anything else, xkb_keysym_from_name()
> will fail. This is unreasonable and bogus. Instead we should allow an
> application to have a case-insensitive search as fallback. This removes
> policy from xkbcommon and moves it into the application. They can now
> decide, whether they use a possibly unsafe fallback or stay with the
> case-sensitive search.
> 
> Therefore, this patch modifies "makekeys" to convert all symbols to
> lower-case and add them into the lookup table, too. The case-sensitive
> symbols are unchanged and stay in the table, but a user-application can
> now fall back to lower-case search if a symbol cannot be resolved.
> 
> This patch does take care that any case-sensitive key overwrites any
> lower-case key. That is, if "makekeys" adds XKB_KEY_A, it will add it as
> "A" and "a". If XKB_KEY_a is added later, it will overwrite the previous
> "a" because this is a case-sensitive match.
> This guarantees that looking up for "a" always returns KEY_a.
> 
> This patch also makes sure that case-sensitive keys are inserted into the
> lookup table first. This guarantees that xkb_keysym_get_name() always
> returns the first match which is the case-sensitive entry.
> 
> There are several theoretical pitfalls here:
>  - Assume there is a key Abcd and abcd. If the user searches for ABCD and
>    does not find anything, the fallback will return "abcd", although The
>    user might have preferred "Abcd".
>    An application can circumwent it by simply not using the fallback
>    lower-case search, so this does not affect existing applications.
>  - Assume there is a key AbCd but no other key "abcd" in any case. If the
>    user searches for "abcd", it will return "AbCd" even though they didn't
>    request a case-insensitive match. This _does_ affect existing
>    applications.
>    However, there is currently no key where a user wouldn't be happy about
>    this, because every key where case really changes semantics, is always
>    available in both, upper-case and lower-case and so this doesn't
>    happen.
>    An application can catch this by converting the result back to a name
>    and checking whether the names match. However, I really don't know why
>    anyone would bother?
> 
> I haven't found any other cases where this changes semantics. There are
> some pitfalls when doing the case-insensitive fallback match, however,
> this is optional!
> The only case were existing application's behavior changes are listed
> above. However, considering that nearly everyone wants lower-case keys,
> anyway, this is in my eyes the best solution.
> 
> Furthermore, new keysyms should be avoided and users should use Unicode
> symbols to identify keys. This is always unambigious and isn't affected at
> all by this patch.
> 
> Signed-off-by: David Herrmann <dh.herrmann at googlemail.com>
> ---
>  makekeys/makekeys.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 67 insertions(+), 4 deletions(-)

I like the idea, and it seems to work.

First, one thing that's easy to miss, this should work:
    assert(test_string("xf86_switch_vt_5", XKB_KEY_XF86Switch_VT_5));
Have a look at src/keysym.c, it treats the "XF86_" case specially.

Second, I think that in the keymap files themselves (xkb_symbols etc.)
we should always do case-sensitive lookup, this would otherwise lead to
ambiguity, and the original xkbcomp won't be able to parse these files,
for no particularly good reason. So LookupKeysym() in xkbcomp/symbols.c
needs to adapt as well, e.g. by comparing to keysym_get_name. If it can
be made more efficient such that it doesn't need the extra lookup +
compare, that's even better.

Also, this probably warrants a note in the xkb_keysym_from_name()
documentation, explaining how to use it for a case-sensitive and
case-insesitive lookup. I actually think that adding a flag to control
this would be best (which would break ABI/API if done directly), but
don't mind much.

Finally, some minor comments below.

> diff --git a/makekeys/makekeys.c b/makekeys/makekeys.c
> index 62d7255..b48e889 100644
> --- a/makekeys/makekeys.c
> +++ b/makekeys/makekeys.c
> @@ -33,6 +33,7 @@
>  
>  #include "xkbcommon/xkbcommon.h"
>  
> +#include <ctype.h>
>  #include <inttypes.h>
>  #include <stdio.h>
>  #include <stdlib.h>
> @@ -40,7 +41,7 @@
>  
>  typedef uint32_t Signature;
>  
> -#define KTNUM 4000
> +#define KTNUM (4000 * 2)
>  
>  static struct info {
>      char        *name;
> @@ -115,19 +116,81 @@ main(int argc, char *argv[])
>                  continue;
>              }
>  
> +            /* Every key is added to the list case-sensitive. That is, all
> +             * character case is preserved. But every key is also added with
> +             * every character converted to lower-case.
> +             * This allows an application to search for a key case-sensitive and
> +             * always get a the correct match. But if nothing is matched, an
> +             * application can search for all lower-case as fallback. This might
> +             * yield another result as expected, so it should only be used as
> +             * fallback.
> +             * It has the side-effect that KEY_A is added as "A" and "a" and
> +             * if we parse KEY_a afterwards, there is already an "a". Therefore,
> +             * a case-sensitive insert _always_ overwrites previous inserts and
> +             * a lower-case entry is only added if the _exact_ match does not
> +             * exist, yet. This guarantees that "a" will always point to KEY_a
> +             * and never to KEY_A. But if there are keys KEY_Abcd and KEY_abcd,
> +             * then KEY_ABCD converted to lower-case will point to KEY_abcd,
> +             * even though the user might have looked for Key_Abcd.
> +             * There is one other side-effect if we have KEY_SomeThing and
> +             * KEY_Something. Both do not collide with their true case, but they
> +             * collide if both are converted to lower-case. In this case, only
> +             * one of both is added in lower-case and it's the one that is
> +             * first parsed. However, such keysyms do not exist, yet, and the
> +             * chance that they are added is low.
> +             * In fact, all collisions that currently exist are unambigious so
> +             * we shouldn't bother too much. */
> +
> +            /* add key case-sensitive */
>              name = malloc(strlen(prefix) + strlen(key) + 1);
>              if (!name) {
>                  fprintf(stderr, "makekeys: out of memory!\n");
>                  exit(1);
>              }
>              sprintf(name, "%s%s", prefix, key);

This looks like asprintf to me. Though if I'm following the code
correctly, 'prefix' doesn't serve any purpose any more, and is always
the empty string; so it should just be removed, making this a strdup.

> -            info[ksnum].name = name;
> -            info[ksnum].val = val;
> -            ksnum++;
> +
> +            for (i = 0; i < ksnum; ++i) {
> +                if (!strcmp(info[i].name, name))
> +                    break;
> +            }
> +
> +            info[i].name = name;
> +            info[i].val = val;
> +            if (i == ksnum)
> +                ksnum++;
> +
>              if (ksnum == KTNUM) {
>                  fprintf(stderr, "makekeys: too many keysyms!\n");
>                  exit(1);
>              }
> +
> +            /* add key as lower-case */
> +            name = malloc(strlen(prefix) + strlen(key) + 1);
> +            if (!name) {
> +                fprintf(stderr, "makekeys: out of memory!\n");
> +                exit(1);
> +            }
> +
> +            sprintf(name, "%s%s", prefix, key);

Same here.

> +            for (i = 0; name[i]; ++i)
> +                name[i] = tolower(name[i]);
> +
> +            for (i = 0; i < ksnum; ++i) {
> +                if (!strcmp(info[i].name, name))
> +                    break;
> +            }

If you're going to change the above, might as well use streq() here from
src/utils.h, which we usually do.

Thanks,
Ran

> +            if (i == ksnum) {
> +                info[ksnum].name = name;
> +                info[ksnum].val = val;
> +                ksnum++;
> +                if (ksnum == KTNUM) {
> +                    fprintf(stderr, "makekeys: too many keysyms!\n");
> +                    exit(1);
> +                }
> +            } else {
> +                free(name);
> +            }
>          }
>  
>          fclose(fptr);
> -- 
> 1.7.12.2
>