[PATCH libxkbcommon 1/4] compose: add xkbcommon-compose - API
David Herrmann
dh.herrmann at gmail.com
Mon Sep 15 05:01:50 PDT 2014
Hi
On Mon, Sep 15, 2014 at 1:48 PM, Ran Benita <ran234 at gmail.com> wrote:
> On Mon, Sep 15, 2014 at 08:41:37AM +0200, David Herrmann wrote:
>> Hi
>
> Hi David
>
>> On Sun, Sep 14, 2014 at 11:05 PM, Ran Benita <ran234 at gmail.com> wrote:
> [snip]
>> > +/**
>> > + * @page compose-cancellation Cancellation Behavior
>> > + * @parblock
>> > + *
>> > + * What should happen when a sequence is cancelled? For example, consider
>> > + * there are only the above sequences, and the input kesysms are
>> > + * \<dead_acute\> \<b\>. There are a few approaches:
>> > + *
>> > + * 1. Swallow the cancelling keysym; that is, no keysym is produced.
>> > + * This is the approach taken by libX11.
>> > + * 2. Let the cancelling keysym through; that is, \<b\> is produced.
>> > + * 3. Replay the entire sequence; that is, \<dead_acute\> \<b\> is produced.
>> > + * This is the approach taken by Microsoft Windows (approximately;
>> > + * instead of \<dead_acute\>, the underlying key is used. This is
>> > + * difficult to simulate with XKB keymaps).
>> > + *
>> > + * You can program whichever approach best fits users' expectations.
>>
>> Hm, implementing 3) is a pain as we have to track the keysyms
>> separately. Your compose-API does not provide a way to retrieve the
>> parsed/failed sequence.
>
> I think alternative 3 is the nicest really, so I want to make it
> possible, without too much work if possible.
I agree!
> Tracking the sequence is possible - I've added a return value to
> xkb_compose_state_feed() for this purpose. It can be done with e.g. a
> wrapper over xkb_compose_state_feed(), something like:
Yes, sure, but I wanted to avoid tracking it separately. We could just
climb up the compose-trie after we cancelled it and recreate the list
of keysyms?
If that's not possible, tracking it separately should be fine. It
isn't that much work..
>> But given that we have no dead-key =>
>> normal-key conversion right now, it's probably fine. If we want it, we
>> can add an API for both later on (assuming a trivial keysym conversion
>> from dead_key => normal is possible).
>
> Yes, maybe we can add such a function, once we have more experience.
> Another option, which relies entirely on convention, is to feed the
> dead key twice, and see what comes out:
>
> $ grep -P '^<dead_(.*)> <dead_\1>' /usr/share/X11/locale/en_US.UTF-8/Compose
>
> <dead_tilde> <dead_tilde> : "~" asciitilde # TILDE
> <dead_acute> <dead_acute> : "´" acute # ACUTE ACCENT
> <dead_grave> <dead_grave> : "`" grave # GRAVE ACCENT
> <dead_circumflex> <dead_circumflex> : "^" asciicircum # CIRCUMFLEX ACCENT
> <dead_abovering> <dead_abovering> : "°" degree # DEGREE SIGN
> <dead_macron> <dead_macron> : "¯" macron # MACRON
> <dead_breve> <dead_breve> : "˘" breve # BREVE
> <dead_abovedot> <dead_abovedot> : "˙" abovedot # DOT ABOVE
> <dead_diaeresis> <dead_diaeresis> : "¨" diaeresis # DIAERESIS
> <dead_doubleacute> <dead_doubleacute> : "˝" U2dd # DOUBLE ACUTE ACCENT
> <dead_caron> <dead_caron> : "ˇ" caron # CARON
> <dead_cedilla> <dead_cedilla> : "¸" cedilla # CEDILLA
> <dead_ogonek> <dead_ogonek> : "˛" ogonek # OGONEK
> <dead_iota> <dead_iota> : "ͺ" U37a # GREEK YPOGEGRAMMENI
> <dead_belowdot> <dead_belowdot> : "̣" U0323 # COMBINING DOT BELOW
> <dead_belowcomma> <dead_belowcomma> : "," comma # COMMA
> <dead_currency> <dead_currency> : "¤" currency # CURRENCY SIGN
> <dead_greek> <dead_greek> : "µ" U00B5 # MICRO SIGN
> <dead_hook> <dead_hook> : "̉" U0309 # COMBINING HOOK ABOVE
> <dead_horn> <dead_horn> : "̛" U031B # COMBINING HORN
> <dead_stroke> <dead_stroke> : "/" slash # SOLIDUS
>
> Probably not a good idea..
Ewww!
>> > +/** Status of the Compose sequence state machine. */
>> > +enum xkb_compose_status {
>> > + /** The initial state; no sequence has started yet. */
>> > + XKB_COMPOSE_NOTHING,
>> > + /** In the middle of a sequence. */
>> > + XKB_COMPOSE_COMPOSING,
>> > + /** A complete sequence has been matched. */
>> > + XKB_COMPOSE_COMPOSED,
>> > + /** The last sequence was cancelled due to an invalid keysym. */
>> > + XKB_COMPOSE_CANCELLED
>>
>> It is unclear what happens if a keysym is pressed that is _not_ part
>> of a compose sequence (that is, most keys). 'context' is 0 but no
>> matching compose node is found. I assume it generates
>> XKB_COMPOSE_NOTHING, but the comment here is unclear. Maybe the
>> _feed() or _get_state() description should mention how keys are
>> treated that are not part of compose sequences (and which are fed
>> while no compose sequence is active). I assume we do *not* return
>> XKB_COMPOSE_COMPOSED in those cases?
>
> Exactly, it is NOTHING, not COMPOSED. I'll make this more clear.
Thanks!
> If you get multiple keysyms, you should not feed them, because this is
> not something that the current Compose format expects or supports. I'll
> mention that.
>
> The way to use multiple-keysysm is still up to interpretation I guess,
> since it hasn't been used yet. But my notion is that it *should* be
> treated as atomic, the use case being to support Unicode combining
> characters instead of requiring precomposed characters, which are not
> always available.
Right, I remember again.
>> > +/**
>> > + * Get the result keysym for a composed sequence.
>> > + *
>> > + * See @ref compose-overview for more details. This function is only
>> > + * useful when the status is XKB_COMPOSE_COMPOSED.
>> > + *
>> > + * @returns The result keysym. If the sequence is not complete, or does
>> > + * not specify a result keysym, returns XKB_KEY_NoSymbol.
>> > + *
>> > + * @memberof xkb_compose_state
>> > + **/
>> > +xkb_keysym_t
>> > +xkb_compose_state_get_one_sym(struct xkb_compose_state *state);
>>
>> Why _one_sym() and not _get_syms()? Yeah, the current format only
>> allows one symbol, but I don't see why we restrict the API in such
>> ways.
>
> When we initially discussed this I was against it, since it thought it
> would be a bit too flexibile/crazy - essentially mapping one sequence to
> another sequence. But considering what I just said above, it actually
> makes perfect sense - Compose is exactly the place where being able to
> output say 2 keysyms would be useful. E.g., base letter e (U+0065)
> followed be combining acute accent (U+0301) (stolen from wikipedia -
> though a better example would be something without a precomposed
> codepoint).
>
> This can be implemented in a backward-compatible way, even without
> needing a V2 (just extending the V1 like we do in the keymaps), and
> using some hackery, also without incurring any memory bloat for the
> common case.
>
> But due the lack of time I'll leave it as a TODO for now.
Fair enough.
>> I mean, the UTF-8 fallback is kinda ugly right now and we might
>> be able to fix it in a V2 format if we allow multiple syms.
>
> Which UTF-8 fallback do you mean?
Forget that.. I was somehow annoyed that compose returns UTF-8 strings
instead of keysyms. But I noticed that compose is limited to text
input by design. I mean, we don't even generate key-up/down events for
composed sequences (which wouldn't make any sense), so a text-string
as result is totally fine.
Thanks
David
More information about the wayland-devel
mailing list