[PATCH libxkbcommon 1/4] compose: add xkbcommon-compose - API

Mon Sep 15 05:01:50 PDT 2014

Hi

On Mon, Sep 15, 2014 at 1:48 PM, Ran Benita <ran234 at gmail.com> wrote:
> On Mon, Sep 15, 2014 at 08:41:37AM +0200, David Herrmann wrote:
>> Hi
>
> Hi David
>
>> On Sun, Sep 14, 2014 at 11:05 PM, Ran Benita <ran234 at gmail.com> wrote:
> [snip]
>> > +/**
>> > + * @page compose-cancellation Cancellation Behavior
>> > + * @parblock
>> > + *
>> > + * What should happen when a sequence is cancelled?  For example, consider
>> > + * there are only the above sequences, and the input kesysms are
>> > + * \<dead_acute\> \<b\>.  There are a few approaches:
>> > + *
>> > + * 1. Swallow the cancelling keysym; that is, no keysym is produced.
>> > + *    This is the approach taken by libX11.
>> > + * 2. Let the cancelling keysym through; that is, \<b\> is produced.
>> > + * 3. Replay the entire sequence; that is, \<dead_acute\> \<b\> is produced.
>> > + *    This is the approach taken by Microsoft Windows (approximately;
>> > + *    instead of \<dead_acute\>, the underlying key is used.  This is
>> > + *    difficult to simulate with XKB keymaps).
>> > + *
>> > + * You can program whichever approach best fits users' expectations.
>>
>> Hm, implementing 3) is a pain as we have to track the keysyms
>> separately. Your compose-API does not provide a way to retrieve the
>> parsed/failed sequence.
>
> I think alternative 3 is the nicest really, so I want to make it
> possible, without too much work if possible.

I agree!

> Tracking the sequence is possible - I've added a return value to
> xkb_compose_state_feed() for this purpose. It can be done with e.g. a
> wrapper over xkb_compose_state_feed(), something like:

Yes, sure, but I wanted to avoid tracking it separately. We could just
climb up the compose-trie after we cancelled it and recreate the list
of keysyms?
If that's not possible, tracking it separately should be fine. It
isn't that much work..

>> But given that we have no dead-key =>
>> normal-key conversion right now, it's probably fine. If we want it, we
>> can add an API for both later on (assuming a trivial keysym conversion
>> from dead_key => normal is possible).
>
> Yes, maybe we can add such a function, once we have more experience.
> Another option, which relies entirely on convention, is to feed the
> dead key twice, and see what comes out:
>
>     $ grep -P '^<dead_(.*)> <dead_\1>' /usr/share/X11/locale/en_US.UTF-8/Compose
>
>     <dead_tilde> <dead_tilde>               : "~"  asciitilde # TILDE
>     <dead_acute> <dead_acute>               : "´"  acute # ACUTE ACCENT
>     <dead_grave> <dead_grave>               : "`"  grave # GRAVE ACCENT
>     <dead_circumflex> <dead_circumflex>     : "^"  asciicircum # CIRCUMFLEX ACCENT
>     <dead_abovering> <dead_abovering>       : "°"  degree # DEGREE SIGN
>     <dead_macron> <dead_macron>             : "¯"  macron # MACRON
>     <dead_breve> <dead_breve>               : "˘"  breve # BREVE
>     <dead_abovedot> <dead_abovedot>         : "˙"  abovedot # DOT ABOVE
>     <dead_diaeresis> <dead_diaeresis>       : "¨"  diaeresis # DIAERESIS
>     <dead_doubleacute> <dead_doubleacute>   : "˝"  U2dd # DOUBLE ACUTE ACCENT
>     <dead_caron> <dead_caron>               : "ˇ"  caron # CARON
>     <dead_cedilla> <dead_cedilla>           : "¸"  cedilla # CEDILLA
>     <dead_ogonek> <dead_ogonek>             : "˛"  ogonek # OGONEK
>     <dead_iota> <dead_iota>                 : "ͺ"  U37a # GREEK YPOGEGRAMMENI
>     <dead_belowdot> <dead_belowdot>         : "̣"   U0323 # COMBINING DOT BELOW
>     <dead_belowcomma> <dead_belowcomma>     : ","  comma # COMMA
>     <dead_currency> <dead_currency>         : "¤"  currency # CURRENCY SIGN
>     <dead_greek> <dead_greek>               : "µ"  U00B5 # MICRO SIGN
>     <dead_hook> <dead_hook>                 : "̉"   U0309 # COMBINING HOOK ABOVE
>     <dead_horn> <dead_horn>                 : "̛"   U031B # COMBINING HORN
>     <dead_stroke> <dead_stroke>             : "/"  slash # SOLIDUS
>
> Probably not a good idea..

Ewww!

>> > +/** Status of the Compose sequence state machine. */
>> > +enum xkb_compose_status {
>> > +    /** The initial state; no sequence has started yet. */
>> > +    XKB_COMPOSE_NOTHING,
>> > +    /** In the middle of a sequence. */
>> > +    XKB_COMPOSE_COMPOSING,
>> > +    /** A complete sequence has been matched. */
>> > +    XKB_COMPOSE_COMPOSED,
>> > +    /** The last sequence was cancelled due to an invalid keysym. */
>> > +    XKB_COMPOSE_CANCELLED
>>
>> It is unclear what happens if a keysym is pressed that is _not_ part
>> of a compose sequence (that is, most keys). 'context' is 0 but no
>> matching compose node is found. I assume it generates
>> XKB_COMPOSE_NOTHING, but the comment here is unclear. Maybe the
>> _feed() or _get_state() description should mention how keys are
>> treated that are not part of compose sequences (and which are fed
>> while no compose sequence is active). I assume we do *not* return
>> XKB_COMPOSE_COMPOSED in those cases?
>
> Exactly, it is NOTHING, not COMPOSED. I'll make this more clear.

Thanks!

> If you get multiple keysyms, you should not feed them, because this is
> not something that the current Compose format expects or supports. I'll
> mention that.
>
> The way to use multiple-keysysm is still up to interpretation I guess,
> since it hasn't been used yet. But my notion is that it *should* be
> treated as atomic, the use case being to support Unicode combining
> characters instead of requiring precomposed characters, which are not
> always available.

Right, I remember again.

>> > +/**
>> > + * Get the result keysym for a composed sequence.
>> > + *
>> > + * See @ref compose-overview for more details.  This function is only
>> > + * useful when the status is XKB_COMPOSE_COMPOSED.
>> > + *
>> > + * @returns The result keysym.  If the sequence is not complete, or does
>> > + * not specify a result keysym, returns XKB_KEY_NoSymbol.
>> > + *
>> > + * @memberof xkb_compose_state
>> > + **/
>> > +xkb_keysym_t
>> > +xkb_compose_state_get_one_sym(struct xkb_compose_state *state);
>>
>> Why _one_sym() and not _get_syms()? Yeah, the current format only
>> allows one symbol, but I don't see why we restrict the API in such
>> ways.
>
> When we initially discussed this I was against it, since it thought it
> would be a bit too flexibile/crazy - essentially mapping one sequence to
> another sequence. But considering what I just said above, it actually
> makes perfect sense - Compose is exactly the place where being able to
> output say 2 keysyms would be useful. E.g., base letter e (U+0065)
> followed be combining acute accent (U+0301) (stolen from wikipedia -
> though a better example would be something without a precomposed
> codepoint).
>
> This can be implemented in a backward-compatible way, even without
> needing a V2 (just extending the V1 like we do in the keymaps), and
> using some hackery, also without incurring any memory bloat for the
> common case.
>
> But due the lack of time I'll leave it as a TODO for now.

Fair enough.

>> I mean, the UTF-8 fallback is kinda ugly right now and we might
>> be able to fix it in a V2 format if we allow multiple syms.
>
> Which UTF-8 fallback do you mean?

Forget that.. I was somehow annoyed that compose returns UTF-8 strings
instead of keysyms. But I noticed that compose is limited to text
input by design. I mean, we don't even generate key-up/down events for
composed sequences (which wouldn't make any sense), so a text-string
as result is totally fine.

Thanks
David