[PATCH libxkbcommon 1/4] compose: add xkbcommon-compose - API

Ran Benita ran234 at gmail.com
Mon Sep 15 04:48:42 PDT 2014


On Mon, Sep 15, 2014 at 08:41:37AM +0200, David Herrmann wrote:
> Hi

Hi David

> On Sun, Sep 14, 2014 at 11:05 PM, Ran Benita <ran234 at gmail.com> wrote:
[snip]
> > +/**
> > + * @page compose-cancellation Cancellation Behavior
> > + * @parblock
> > + *
> > + * What should happen when a sequence is cancelled?  For example, consider
> > + * there are only the above sequences, and the input kesysms are
> > + * \<dead_acute\> \<b\>.  There are a few approaches:
> > + *
> > + * 1. Swallow the cancelling keysym; that is, no keysym is produced.
> > + *    This is the approach taken by libX11.
> > + * 2. Let the cancelling keysym through; that is, \<b\> is produced.
> > + * 3. Replay the entire sequence; that is, \<dead_acute\> \<b\> is produced.
> > + *    This is the approach taken by Microsoft Windows (approximately;
> > + *    instead of \<dead_acute\>, the underlying key is used.  This is
> > + *    difficult to simulate with XKB keymaps).
> > + *
> > + * You can program whichever approach best fits users' expectations.
> 
> Hm, implementing 3) is a pain as we have to track the keysyms
> separately. Your compose-API does not provide a way to retrieve the
> parsed/failed sequence. 

I think alternative 3 is the nicest really, so I want to make it
possible, without too much work if possible.

Tracking the sequence is possible - I've added a return value to
xkb_compose_state_feed() for this purpose. It can be done with e.g. a
wrapper over xkb_compose_state_feed(), something like:

    /* sequence, sequence_len is a persisting array of keysyms. */
    enum xkb_compose_feed_result result;
    enum xkb_compose_status status;

    result = xkb_compose_state_feed(compose_state, keysym);
    if (result == XKB_COMPOSE_FEED_IGNORED)
        return;

    status = xkb_compose_state_get_status(compose_state);
    if ((status == XKB_COMPOSE_COMPOSING || status == XKB_COMPOSE_COMPOSED) &&
        sequence_len < ARRAY_SIZE(sequence))
        sequence[sequence_len++] = keysym;

    /* probably want to do something with the sequence if COMPOSED or
       CANCELLED. But in these cases the sequence needs to be reset
       (Can be a wrapper over xkb_state_compose_reset()). */

I considered adding a function for getting the current sequence, i.e.
doing the above internally. But that's problematic because when a
sequence is cancelled for example, I should reset the sequence, but
that's exactly when the user wants the sequence.

Also, one reasonable way to do #3 is to replay events for the cancelled
sequence. But then you need to remember other stuff, like the keycodes,
timestamps, etc., so the user needs to do the tracking himself anyway.

> But given that we have no dead-key =>
> normal-key conversion right now, it's probably fine. If we want it, we
> can add an API for both later on (assuming a trivial keysym conversion
> from dead_key => normal is possible).

Yes, maybe we can add such a function, once we have more experience.
Another option, which relies entirely on convention, is to feed the
dead key twice, and see what comes out:

    $ grep -P '^<dead_(.*)> <dead_\1>' /usr/share/X11/locale/en_US.UTF-8/Compose

    <dead_tilde> <dead_tilde>               : "~"  asciitilde # TILDE
    <dead_acute> <dead_acute>               : "´"  acute # ACUTE ACCENT
    <dead_grave> <dead_grave>               : "`"  grave # GRAVE ACCENT
    <dead_circumflex> <dead_circumflex>     : "^"  asciicircum # CIRCUMFLEX ACCENT
    <dead_abovering> <dead_abovering>       : "°"  degree # DEGREE SIGN
    <dead_macron> <dead_macron>             : "¯"  macron # MACRON
    <dead_breve> <dead_breve>               : "˘"  breve # BREVE
    <dead_abovedot> <dead_abovedot>         : "˙"  abovedot # DOT ABOVE
    <dead_diaeresis> <dead_diaeresis>       : "¨"  diaeresis # DIAERESIS
    <dead_doubleacute> <dead_doubleacute>   : "˝"  U2dd # DOUBLE ACUTE ACCENT
    <dead_caron> <dead_caron>               : "ˇ"  caron # CARON
    <dead_cedilla> <dead_cedilla>           : "¸"  cedilla # CEDILLA
    <dead_ogonek> <dead_ogonek>             : "˛"  ogonek # OGONEK
    <dead_iota> <dead_iota>                 : "ͺ"  U37a # GREEK YPOGEGRAMMENI
    <dead_belowdot> <dead_belowdot>         : "̣"   U0323 # COMBINING DOT BELOW
    <dead_belowcomma> <dead_belowcomma>     : ","  comma # COMMA
    <dead_currency> <dead_currency>         : "¤"  currency # CURRENCY SIGN
    <dead_greek> <dead_greek>               : "µ"  U00B5 # MICRO SIGN
    <dead_hook> <dead_hook>                 : "̉"   U0309 # COMBINING HOOK ABOVE
    <dead_horn> <dead_horn>                 : "̛"   U031B # COMBINING HORN
    <dead_stroke> <dead_stroke>             : "/"  slash # SOLIDUS

Probably not a good idea..

> I also don't understand why dead-keys are related to keymaps? I mean,
> yeah, the "nodeadkey" variant is part of the keymap, but the keymap
> never defines behavior of dead-keys. We could simply provide a lookup
> table that converts XKB_KEY_dead_xyz to XKB_KEY_xyz, right?

Right, this is what I meant with "This is difficult to simulate with XKB
keymaps". But I need to research this a bit more..

Btw, when I wrote this is what Windows does it was from vague memory and
some internet searching, so maybe false advertising :) If someone can
confirm that this is what it does that'll be great.

> > +/**
> > + * Create a new compose table from a memory buffer.
> > + *
> > + * This is just like xkb_compose_table_new_from_file(), but instead of
> > + * a file, gets the table as one enormous string.
> > + *
> > + * @see xkb_compose_table_new_from_file()
> > + * @memberof xkb_compose_table
> > + */
> > +struct xkb_compose_table *
> > +xkb_compose_table_new_from_buffer(struct xkb_context *context,
> > +                                  const char *buffer, size_t length,
> > +                                  const char *locale,
> > +                                  enum xkb_compose_format format,
> > +                                  enum xkb_compose_compile_flags flags);
> 
> _from_buffer() right from the beginning, yey!

Hehe. Yea, _from_string() is entirely redundant.

> > +/** Status of the Compose sequence state machine. */
> > +enum xkb_compose_status {
> > +    /** The initial state; no sequence has started yet. */
> > +    XKB_COMPOSE_NOTHING,
> > +    /** In the middle of a sequence. */
> > +    XKB_COMPOSE_COMPOSING,
> > +    /** A complete sequence has been matched. */
> > +    XKB_COMPOSE_COMPOSED,
> > +    /** The last sequence was cancelled due to an invalid keysym. */
> > +    XKB_COMPOSE_CANCELLED
> 
> It is unclear what happens if a keysym is pressed that is _not_ part
> of a compose sequence (that is, most keys). 'context' is 0 but no
> matching compose node is found. I assume it generates
> XKB_COMPOSE_NOTHING, but the comment here is unclear. Maybe the
> _feed() or _get_state() description should mention how keys are
> treated that are not part of compose sequences (and which are fed
> while no compose sequence is active). I assume we do *not* return
> XKB_COMPOSE_COMPOSED in those cases?

Exactly, it is NOTHING, not COMPOSED. I'll make this more clear.

> > +};
> > +
> > +/** The effect of a keysym fed to xkb_compose_state_feed(). */
> > +enum xkb_compose_feed_result {
> > +    /** The keysym had no effect. */
> > +    XKB_COMPOSE_FEED_IGNORED,
> > +    /** The keysym started, advanced or cancelled a sequence. */
> > +    XKB_COMPOSE_FEED_ACCEPTED
> > +};
> > +
> > +/**
> > + * Feed one keysym to the Compose sequence state machine.
> > + *
> > + * This function advances into a compose sequence, cancels it, or has no
> > + * effect (e.g. for modifier keysyms).  The resulting status may be
> > + * observed with xkb_compose_state_get_status().
> > + *
> > + * @param state
> > + *     The compose state object.
> > + * @param keysym
> > + *     A keysym, usually obtained after a key-press event, with a
> > + *     function such as xkb_state_key_get_one_sym().
> 
> If a keypress generates multiple keysyms, are we supposed to call this
> function in a loop? Or are we supposed to not feed such data into the
> compose state? I remember we had this discussion before, but I'm not
> sure what we agreed on. I think the conclusion was to always treat
> multiple keysyms as an array of normal syms, not as an atomic
> keypress, right?

If you get multiple keysyms, you should not feed them, because this is
not something that the current Compose format expects or supports. I'll
mention that.

The way to use multiple-keysysm is still up to interpretation I guess,
since it hasn't been used yet. But my notion is that it *should* be
treated as atomic, the use case being to support Unicode combining
characters instead of requiring precomposed characters, which are not
always available.

> > +/**
> > + * Get the result keysym for a composed sequence.
> > + *
> > + * See @ref compose-overview for more details.  This function is only
> > + * useful when the status is XKB_COMPOSE_COMPOSED.
> > + *
> > + * @returns The result keysym.  If the sequence is not complete, or does
> > + * not specify a result keysym, returns XKB_KEY_NoSymbol.
> > + *
> > + * @memberof xkb_compose_state
> > + **/
> > +xkb_keysym_t
> > +xkb_compose_state_get_one_sym(struct xkb_compose_state *state);
> 
> Why _one_sym() and not _get_syms()? Yeah, the current format only
> allows one symbol, but I don't see why we restrict the API in such
> ways.

When we initially discussed this I was against it, since it thought it
would be a bit too flexibile/crazy - essentially mapping one sequence to
another sequence. But considering what I just said above, it actually
makes perfect sense - Compose is exactly the place where being able to
output say 2 keysyms would be useful. E.g., base letter e (U+0065)
followed be combining acute accent (U+0301) (stolen from wikipedia -
though a better example would be something without a precomposed
codepoint).

This can be implemented in a backward-compatible way, even without
needing a V2 (just extending the V1 like we do in the keymaps), and
using some hackery, also without incurring any memory bloat for the
common case.

But due the lack of time I'll leave it as a TODO for now.

> I mean, the UTF-8 fallback is kinda ugly right now and we might
> be able to fix it in a V2 format if we allow multiple syms.

Which UTF-8 fallback do you mean?

> Thanks a lot for the work, Ran!
> David

Thanks for the comments!
Ran


More information about the wayland-devel mailing list