Can't track flow of characters in from Input Method Editor

Richard Wordingham richard.wordingham at ntlworld.com
Tue Oct 6 14:51:57 PDT 2015


On Sunday I raised bug report 94753 about the apparent generation of
lone surrogates in response to the use of Keyman for Linux under ibus
as the input method editor. I have compiled Version 4.4.4.3.0+ with
debug to facilitate my investigation; I think my compiler (gcc Version
4.6.3) is too old to compile Version 5.0, which is where I noticed the
problem.

I use emacs as an IDE for debugging, but Emacs Version 24 does not seem
able to cope with Version 4.4.4.3.0+.  The debugger gdb run from the
terminal appears to be able to cope.  I have been trying to narrow down
the source of the error by inserting fprintf() calls.  However, I cannot
find where characters enter the program from the IME.  I am running
Ubuntu 12.04 with the default desktop.  The IME is KMfL running under
ibus.

I set up fprintf() and abort() calls to monitor the apparent sole call
of XmbLookupString (there are no visible calls of XwcLookupString) and
also within the call of SalKDEDisplay::checkdirectInputEvent().
However, inputting text from the Supplementary Multilingual Plane using
the IME to input characters generates neither output from the fprintf()
calls nor a core dump from abort().  Have I overlooked another route by
which characters are reaching the program?

My current suspicion is that Qt is not handling KMfL's replacement of
one supplementary character by another properly, but I cannot
demonstrate that.  My test input text sequence is the three characters
dYH, which when applied to an instrumented program using X generates
the characters U+1148F, U+114C0, U+0008 (also as symbol), U+114BF.  I
suspect that U+0008 is only cancelling the low surrogate of U+114C0,
and that this is happening in Qt code. I have seen similar behaviour
with Konsole, which I believe is a Qt application.  Claws mail,
Gnome-terminal, Emacs Version 24, gedit, Abiword and even LibreOffice
Calc all exhibit receipt of the correct sequence of characters, namely
<U+1148F, U+114BF>.  (Some of these do not display it properly, but
that is another issue.)

Richard.


More information about the LibreOffice mailing list