Can't track flow of characters in from Input Method Editor
Richard Wordingham
richard.wordingham at ntlworld.com
Tue Oct 6 14:51:57 PDT 2015
On Sunday I raised bug report 94753 about the apparent generation of
lone surrogates in response to the use of Keyman for Linux under ibus
as the input method editor. I have compiled Version 4.4.4.3.0+ with
debug to facilitate my investigation; I think my compiler (gcc Version
4.6.3) is too old to compile Version 5.0, which is where I noticed the
problem.
I use emacs as an IDE for debugging, but Emacs Version 24 does not seem
able to cope with Version 4.4.4.3.0+. The debugger gdb run from the
terminal appears to be able to cope. I have been trying to narrow down
the source of the error by inserting fprintf() calls. However, I cannot
find where characters enter the program from the IME. I am running
Ubuntu 12.04 with the default desktop. The IME is KMfL running under
ibus.
I set up fprintf() and abort() calls to monitor the apparent sole call
of XmbLookupString (there are no visible calls of XwcLookupString) and
also within the call of SalKDEDisplay::checkdirectInputEvent().
However, inputting text from the Supplementary Multilingual Plane using
the IME to input characters generates neither output from the fprintf()
calls nor a core dump from abort(). Have I overlooked another route by
which characters are reaching the program?
My current suspicion is that Qt is not handling KMfL's replacement of
one supplementary character by another properly, but I cannot
demonstrate that. My test input text sequence is the three characters
dYH, which when applied to an instrumented program using X generates
the characters U+1148F, U+114C0, U+0008 (also as symbol), U+114BF. I
suspect that U+0008 is only cancelling the low surrogate of U+114C0,
and that this is happening in Qt code. I have seen similar behaviour
with Konsole, which I believe is a Qt application. Claws mail,
Gnome-terminal, Emacs Version 24, gedit, Abiword and even LibreOffice
Calc all exhibit receipt of the correct sequence of characters, namely
<U+1148F, U+114BF>. (Some of these do not display it properly, but
that is another issue.)
Richard.
More information about the LibreOffice
mailing list