[Uim] Japanese input
TOKUNAGA Hiroyuki
tkng at xem.jp
Thu Jul 21 09:51:31 EEST 2005
This mail is a continuation of previous one.
We can categorize Japanese input method engine into 4 types.
1. Multisegment
Anthy, Canna
2. Unisegment
SKK
3. Predict
PRIME
4. Others
t-code, tut-code
What is multisegment
====================
Multisegment type convert hiragana sentence to kana kanji mixed
sentence. Ordinary, a sentence includes plural segment, so they called
as multisegment. This need complex analysis of sentence.
Anthy
=====
Anthy is a mutisegment kana kanji conversion engine. Most part of the
Anthy is written by Yusuke TABATA. He is also the originator of uim.
Uim has a glue code for Anthy, it's called as uim-anthy.
Canna
=====
Canna is also a multisegment kana kanji conversion server, developed by
NEC. Uim has a glue code for Canna, it's called as uim-canna.
What is unisegment
==================
Unisegment type convert only one segment or word. They cannot convert
whole sentence at once.
SKK
===
Original implementation of SKK is written in emacs lisp. Uim has
another implementation of SKK, whici is written in C and Scheme, called
as uim-skk. Original uim-skk is written by Yusuke, then some developers
improved it. In recent days, Etsushi Kato is the most active developer
of the uim-skk. Unlike other glue codes, uim-skk is self-contained.
SKK is a unisegment kana kanji conversion engine. In addition, SKK
doesn't convert kana text to kanji by default. To convert to kanji
text, you have to start a word with Capitalized character. This
strategy is assuming that most part of kana kanji mixed text is
katakana or hiragana. SKK has addicted admirers, but it's a minor input
method.
PRIME
=====
PRIME is an prediction based input method. Uim has a glue code for
PRIME, called uim-prime. Sinse PRIME predict what you want to input,
you need not to type all of text. PRIME is applicable not only for
Japanese input but also for English input.
Other
=====
Other types is difficult to explain. But I can say they are very minor.
Non-native Japanese speakers need not to know these very very minor
input methods.
t-code
======
I don't know how to represent t-code exactly. T-code is a way to
generate kanji character from key combinations rather than input
method.
For example, 'aa' generates '種' with t-code. Since it's not a real
input method, it is possible to use t-code and other input method.
Uim doesn't have a such implementation, but PRIME has.
Difference of t-code and tut-code is a table for conversion.
Regards,
--
TOKUNAGA Hiroyuki
tkng at xem.jp
More information about the uim
mailing list