[PATCH libxkbcommon 1/4] compose: add xkbcommon-compose - API

David Herrmann dh.herrmann at gmail.com
Sun Sep 14 23:41:37 PDT 2014


Hi

On Sun, Sep 14, 2014 at 11:05 PM, Ran Benita <ran234 at gmail.com> wrote:
> xkbcommon-compose is a Compose implementation for xkbcommon. It mostly
> behaves like libX11's Compose, but the support is somewhat low-level and
> is not transparent like in libX11. The user must add some supporting code
> in order to utilize it.
>
> The intended audience are users who use xkbcommon but not a full-blown
> input method. With this they can add Compose support in a straightforward
> manner, so they have a fairly complete keyboard input for Latin-like
> languages at least.
>
> See the header documentation for details.
>
> Signed-off-by: Ran Benita <ran234 at gmail.com>
> ---
>  xkbcommon/xkbcommon-compose.h | 457 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 457 insertions(+)
>  create mode 100644 xkbcommon/xkbcommon-compose.h
>
> diff --git a/xkbcommon/xkbcommon-compose.h b/xkbcommon/xkbcommon-compose.h
> new file mode 100644
> index 0000000..ed35250
> --- /dev/null
> +++ b/xkbcommon/xkbcommon-compose.h
> @@ -0,0 +1,457 @@
> +/*
> + * Copyright © 2013 Ran Benita
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef _XKBCOMMON_COMPOSE_H
> +#define _XKBCOMMON_COMPOSE_H
> +
> +#include <xkbcommon/xkbcommon.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @file
> + * libxkbcommon Compose API - support for Compose and dead-keys.
> + */
> +
> +/**
> + * @defgroup compose Compose and dead-keys support
> + * Support for Compose and dead-keys.
> + * @since TBD
> + *
> + * @{
> + */
> +
> +/**
> + * @page compose-overview Overview
> + * @parblock
> + *
> + * Compose and dead-keys are a common feature of many keyboard input
> + * systems.  They extend the range of the keysysm that can be produced
> + * directly from a keyboard by using a sequence of key strokes, instead
> + * of just one.
> + *
> + * Here are some example sequences, in the libX11 Compose file format:
> + *
> + *     <dead_acute> <a>         : "á"   aacute # LATIN SMALL LETTER A WITH ACUTE
> + *     <Multi_key> <A> <T>      : "@"   at # COMMERCIAL AT
> + *
> + * When the user presses a key which produces the \<dead_acute> keysym,
> + * nothing initially happens (thus the key is dubbed a "dead-key").  But
> + * when the user enters <a>, "á" is "composed", in place of "a".  If
> + * instead the user had entered a keysym which does not follow
> + * \<dead_acute\> in any compose sequence, the sequence is said to be
> + * "cancelled".
> + *
> + * Compose files define many such sequences.  For a description of the
> + * common file format for Compose files, see the Compose(5) man page.
> + *
> + * A successfuly-composed sequence has two results: a keysym and a UTF-8
> + * string.  At least one of the two is defined for each sequence.  If only
> + * a keysym is given, the keysym's string representation is used for the
> + * result string (using xkb_keysym_to_utf8()).
> + *
> + * This library provides low-level support for Compose file parsing and
> + * processing.  Higher-level APIs (such as libX11's Xutf8LookupString(3))
> + * may be built upon it, or it can be used directly.
> + *
> + * @endparblock
> + */
> +
> +/**
> + * @page compose-conflicting Conflicting Sequences
> + * @parblock
> + *
> + * To avoid ambiguity, a sequence is not allowed to be a prefix of another.
> + * In such a case, the conflict is resolved thus:
> + *
> + * 1. A longer sequence overrides a shorter one.
> + * 2. An equal sequence overrides an existing one.
> + * 3. A shorter sequence does not override a longer one.
> + *
> + * Sequences of length 1 are allowed, although they are not common.
> + *
> + * @endparblock
> + */
> +
> +/**
> + * @page compose-cancellation Cancellation Behavior
> + * @parblock
> + *
> + * What should happen when a sequence is cancelled?  For example, consider
> + * there are only the above sequences, and the input kesysms are
> + * \<dead_acute\> \<b\>.  There are a few approaches:
> + *
> + * 1. Swallow the cancelling keysym; that is, no keysym is produced.
> + *    This is the approach taken by libX11.
> + * 2. Let the cancelling keysym through; that is, \<b\> is produced.
> + * 3. Replay the entire sequence; that is, \<dead_acute\> \<b\> is produced.
> + *    This is the approach taken by Microsoft Windows (approximately;
> + *    instead of \<dead_acute\>, the underlying key is used.  This is
> + *    difficult to simulate with XKB keymaps).
> + *
> + * You can program whichever approach best fits users' expectations.

Hm, implementing 3) is a pain as we have to track the keysyms
separately. Your compose-API does not provide a way to retrieve the
parsed/failed sequence. But given that we have no dead-key =>
normal-key conversion right now, it's probably fine. If we want it, we
can add an API for both later on (assuming a trivial keysym conversion
from dead_key => normal is possible).

I also don't understand why dead-keys are related to keymaps? I mean,
yeah, the "nodeadkey" variant is part of the keymap, but the keymap
never defines behavior of dead-keys. We could simply provide a lookup
table that converts XKB_KEY_dead_xyz to XKB_KEY_xyz, right?

> + *
> + * @endparblock
> + */
> +
> +/**
> + * @struct xkb_compose_table
> + * Opaque Compose table object.
> + *
> + * The compose table holds the definitions of the Compose sequences, as
> + * gathered from Compose files.  It is immutable.
> + */
> +struct xkb_compose_table;
> +
> +/**
> + * @struct xkb_compose_state
> + * Opaque Compose state object.
> + *
> + * The compose state maintains state for compose sequence matching, such
> + * as which possible sequences are being matched, and the position within
> + * these sequences.  It acts as a simple state machine wherein keysyms are
> + * the input, and composed keysyms and strings are the output.
> + *
> + * The compose state is usually associated with a keyboard device.
> + */
> +struct xkb_compose_state;
> +
> +/** Flags affecting Compose file compilation. */
> +enum xkb_compose_compile_flags {
> +    /** Do not apply any flags. */
> +    XKB_COMPOSE_COMPILE_NO_FLAGS = 0
> +};
> +
> +/** The recognized Compose file formats. */
> +enum xkb_compose_format {
> +    /** The classic libX11 Compose text format, described in Compose(5). */
> +    XKB_COMPOSE_FORMAT_TEXT_V1 = 1
> +};
> +
> +/**
> + * @page compose-locale Compose Locale
> + * @parblock
> + *
> + * Compose files are locale dependent:
> + * - Compose files are written for a locale, and the locale is used when
> + *   searching for the appropriate file to use.
> + * - Compose files may reference the locale internally, with directives
> + *   such as %L.
> + * As such, functions like xkb_compose_table_new_from_locale() require
> + * a @p locale parameter.  This will usually be the current locale (see
> + * locale(7) for more details).  You may also want to allow the user to
> + * explicitly configure it, so he can use the Compose file of a given
> + * locale, but not use that locale for other things.
> + *
> + * You may query the current locale as follows:
> + * @code
> + *     const char *locale;
> + *     locale = setlocale(LC_CTYPE, NULL);
> + * @endcode
> + *
> + * This will only give useful results if the program had previously set
> + * the current locale using setlocale(3), with LC_CTYPE or LC_ALL and a
> + * non-NULL argument.
> + *
> + * If you prefer not to use the locale system of the C runtime library,
> + * you may nevertheless obtain the user's locale directly using
> + * environment variables, as described in locale(7).  For example,
> + * @code
> + *     locale = getenv("LC_ALL");
> + *     if (!locale)
> + *         locale = getenv("LC_CTYPE");
> + *     if (!locale)
> + *         locale = getenv("LANG");
> + *     if (!locale)
> + *         locale = "C";
> + * @endcode
> + *
> + * Note that some locales supported by the C standard library may not
> + * have a Compose file assigned.
> + *
> + * @endparblock
> + */
> +
> +/**
> + * Create a compose table for a given locale.
> + *
> + * The locale is used for searching the file-system for an appropriate
> + * Compose file.  The search order is described in Compose(5).  It is
> + * affected by the following environment variables:
> + * XCOMPOSEFILE, HOME, XLOCALEDIR.
> + *
> + * @param context
> + *     The library context in which to create the compose table.
> + * @param locale
> + *     The current locale.  See @ref compose-locale.
> + * @param flags
> + *     Optional flags for the compose table, or 0.
> + *
> + * @returns A compose table for the given locale, or NULL if the
> + * compilation failed or a Compose file was not found.
> + *
> + * @memberof xkb_compose_table
> + */
> +struct xkb_compose_table *
> +xkb_compose_table_new_from_locale(struct xkb_context *context,
> +                                  const char *locale,
> +                                  enum xkb_compose_compile_flags flags);
> +
> +/**
> + * Create a new compose table from a Compose file.
> + *
> + * @param context
> + *     The library context in which to create the compose table.
> + * @param file
> + *     The Compose file to compile.
> + * @param locale
> + *     The current locale.  See @ref compose-locale.
> + * @param format
> + *     The text format of the Compose file to compile.
> + * @param flags
> + *     Optional flags for the compose table, or 0.
> + *
> + * @returns A compose table compiled from the given file, or NULL if
> + * the compilation failed.
> + *
> + * @memberof xkb_compose_table
> + */
> +struct xkb_compose_table *
> +xkb_compose_table_new_from_file(struct xkb_context *context,
> +                                FILE *file,
> +                                const char *locale,
> +                                enum xkb_compose_format format,
> +                                enum xkb_compose_compile_flags flags);
> +
> +/**
> + * Create a new compose table from a memory buffer.
> + *
> + * This is just like xkb_compose_table_new_from_file(), but instead of
> + * a file, gets the table as one enormous string.
> + *
> + * @see xkb_compose_table_new_from_file()
> + * @memberof xkb_compose_table
> + */
> +struct xkb_compose_table *
> +xkb_compose_table_new_from_buffer(struct xkb_context *context,
> +                                  const char *buffer, size_t length,
> +                                  const char *locale,
> +                                  enum xkb_compose_format format,
> +                                  enum xkb_compose_compile_flags flags);

_from_buffer() right from the beginning, yey!

> +
> +/**
> + * Take a new reference on a compose table.
> + *
> + * @returns The passed in object.
> + *
> + * @memberof xkb_compose_table
> + */
> +struct xkb_compose_table *
> +xkb_compose_table_ref(struct xkb_compose_table *table);
> +
> +/**
> + * Release a reference on a compose table, and possibly free it.
> + *
> + * @param table The object.  If it is NULL, this function does nothing.
> + *
> + * @memberof xkb_compose_table
> + */
> +void
> +xkb_compose_table_unref(struct xkb_compose_table *table);
> +
> +/** Flags for compose state creation. */
> +enum xkb_compose_state_flags {
> +    /** Do not apply any flags. */
> +    XKB_COMPOSE_STATE_NO_FLAGS = 0
> +};
> +
> +/**
> + * Create a new compose state object.
> + *
> + * @param table
> + *     The compose table the state will use.
> + * @param flags
> + *     Optional flags for the compose state, or 0.
> + *
> + * @returns A new compose state, or NULL on failure.
> + *
> + * @memberof xkb_compose_state
> + */
> +struct xkb_compose_state *
> +xkb_compose_state_new(struct xkb_compose_table *table,
> +                      enum xkb_compose_state_flags flags);
> +
> +/**
> + * Take a new reference on a compose state object.
> + *
> + * @returns The passed in object.
> + *
> + * @memberof xkb_compose_state
> + */
> +struct xkb_compose_state *
> +xkb_compose_state_ref(struct xkb_compose_state *state);
> +
> +/**
> + * Release a reference on a compose state object, and possibly free it.
> + *
> + * @param state The object.  If NULL, do nothing.
> + *
> + * @memberof xkb_compose_state
> + */
> +void
> +xkb_compose_state_unref(struct xkb_compose_state *state);
> +
> +/**
> + * Get the compose table which a compose state object is using.
> + *
> + * @returns The compose table which was passed to xkb_compose_state_new()
> + * when creating this state object.
> + *
> + * This function does not take a new reference on the compose table; you
> + * must explicitly reference it yourself if you plan to use it beyond the
> + * lifetime of the state.
> + *
> + * @memberof xkb_compose_state
> + */
> +struct xkb_compose_table *
> +xkb_compose_state_get_compose_table(struct xkb_compose_state *state);
> +
> +/** Status of the Compose sequence state machine. */
> +enum xkb_compose_status {
> +    /** The initial state; no sequence has started yet. */
> +    XKB_COMPOSE_NOTHING,
> +    /** In the middle of a sequence. */
> +    XKB_COMPOSE_COMPOSING,
> +    /** A complete sequence has been matched. */
> +    XKB_COMPOSE_COMPOSED,
> +    /** The last sequence was cancelled due to an invalid keysym. */
> +    XKB_COMPOSE_CANCELLED

It is unclear what happens if a keysym is pressed that is _not_ part
of a compose sequence (that is, most keys). 'context' is 0 but no
matching compose node is found. I assume it generates
XKB_COMPOSE_NOTHING, but the comment here is unclear. Maybe the
_feed() or _get_state() description should mention how keys are
treated that are not part of compose sequences (and which are fed
while no compose sequence is active). I assume we do *not* return
XKB_COMPOSE_COMPOSED in those cases?

> +};
> +
> +/** The effect of a keysym fed to xkb_compose_state_feed(). */
> +enum xkb_compose_feed_result {
> +    /** The keysym had no effect. */
> +    XKB_COMPOSE_FEED_IGNORED,
> +    /** The keysym started, advanced or cancelled a sequence. */
> +    XKB_COMPOSE_FEED_ACCEPTED
> +};
> +
> +/**
> + * Feed one keysym to the Compose sequence state machine.
> + *
> + * This function advances into a compose sequence, cancels it, or has no
> + * effect (e.g. for modifier keysyms).  The resulting status may be
> + * observed with xkb_compose_state_get_status().
> + *
> + * @param state
> + *     The compose state object.
> + * @param keysym
> + *     A keysym, usually obtained after a key-press event, with a
> + *     function such as xkb_state_key_get_one_sym().

If a keypress generates multiple keysyms, are we supposed to call this
function in a loop? Or are we supposed to not feed such data into the
compose state? I remember we had this discussion before, but I'm not
sure what we agreed on. I think the conclusion was to always treat
multiple keysyms as an array of normal syms, not as an atomic
keypress, right?

> + *
> + * @returns Whether the keysym had any effect on the compose state.  This
> + * is useful, for example, if you want to keep a record of the current
> + * sequence, but not for much else.
> + *
> + * @memberof xkb_compose_state
> + */
> +enum xkb_compose_feed_result
> +xkb_compose_state_feed(struct xkb_compose_state *state,
> +                       xkb_keysym_t keysym);
> +
> +/**
> + * Reset the Compose sequence state machine.
> + *
> + * The status is set to XKB_COMPOSE_NOTHING, and the current sequence
> + * is discarded.
> + *
> + * @memberof xkb_compose_state
> + */
> +void
> +xkb_compose_state_reset(struct xkb_compose_state *state);
> +
> +/**
> + * Get the current status of the compose state machine.
> + *
> + * @see xkb_compose_status
> + * @memberof xkb_compose_state
> + **/
> +enum xkb_compose_status
> +xkb_compose_state_get_status(struct xkb_compose_state *state);
> +
> +/**
> + * Get the result Unicode/UTF-8 string for a composed sequence.
> + *
> + * See @ref compose-overview for more details.  This function is only
> + * useful when the status is XKB_COMPOSE_COMPOSED.
> + *
> + * @param[in] state
> + *     The compose state.
> + * @param[out] buffer
> + *     A buffer to write the string into.
> + * @param[in] size
> + *     Size of the buffer.
> + *
> + * @warning If the buffer passed is too small, the string is truncated
> + * (though still NUL-terminated).
> + *
> + * @returns
> + *   The number of bytes required for the string, excluding the NUL byte.
> + *   If the sequence is not complete, or does not have a viable result
> + *   string, returns 0, and sets @p buffer to the empty string (if
> + *   possible).
> + * @returns
> + *   You may check if truncation has occurred by comparing the return value
> + *   with the size of @p buffer, similarly to the snprintf(3) function.
> + *   You may safely pass NULL and 0 to @p buffer and @p size to find the
> + *   required size (without the NUL-byte).
> + *
> + * @memberof xkb_compose_state
> + **/
> +int
> +xkb_compose_state_get_utf8(struct xkb_compose_state *state,
> +                           char *buffer, size_t size);
> +
> +/**
> + * Get the result keysym for a composed sequence.
> + *
> + * See @ref compose-overview for more details.  This function is only
> + * useful when the status is XKB_COMPOSE_COMPOSED.
> + *
> + * @returns The result keysym.  If the sequence is not complete, or does
> + * not specify a result keysym, returns XKB_KEY_NoSymbol.
> + *
> + * @memberof xkb_compose_state
> + **/
> +xkb_keysym_t
> +xkb_compose_state_get_one_sym(struct xkb_compose_state *state);

Why _one_sym() and not _get_syms()? Yeah, the current format only
allows one symbol, but I don't see why we restrict the API in such
ways. I mean, the UTF-8 fallback is kinda ugly right now and we might
be able to fix it in a V2 format if we allow multiple syms.

Thanks a lot for the work, Ran!
David

> +
> +/** @} */
> +
> +#ifdef __cplusplus
> +} /* extern "C" */
> +#endif
> +
> +#endif /* _XKBCOMMON_COMPOSE_H */
> --
> 2.1.0
>
> _______________________________________________
> wayland-devel mailing list
> wayland-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/wayland-devel


More information about the wayland-devel mailing list