[Uim] Japanese input

Thu Jul 21 09:51:31 EEST 2005

This mail is a continuation of previous one.

We can categorize Japanese input method engine into 4 types.

1. Multisegment
  Anthy, Canna

2. Unisegment
  SKK

3. Predict
  PRIME

4. Others
  t-code, tut-code

What is multisegment
====================

Multisegment type convert hiragana sentence to kana kanji mixed
sentence. Ordinary, a sentence includes plural segment, so they called
as multisegment. This need complex analysis of sentence.

Anthy
=====

Anthy is a mutisegment kana kanji conversion engine. Most part of the
Anthy is written by Yusuke TABATA. He is also the originator of uim.
Uim has a glue code for Anthy, it's called as uim-anthy.

Canna
=====

Canna is also a multisegment kana kanji conversion server, developed by
NEC. Uim has a glue code for Canna, it's called as uim-canna.

What is unisegment
==================

Unisegment type convert only one segment or word. They cannot convert
whole sentence at once.

SKK
===

Original implementation of SKK is written in emacs lisp. Uim has
another implementation of SKK, whici is written in C and Scheme, called
as uim-skk. Original uim-skk is written by Yusuke, then some developers
improved it. In recent days, Etsushi Kato is the most active developer
of the uim-skk. Unlike other glue codes, uim-skk is self-contained.

SKK is a unisegment kana kanji conversion engine. In addition, SKK
doesn't convert kana text to kanji by default. To convert to kanji
text, you have to start a word with Capitalized character. This
strategy is assuming that most part of kana kanji mixed text is
katakana or hiragana. SKK has addicted admirers, but it's a minor input
method.

PRIME
=====

PRIME is an prediction based input method. Uim has a glue code for
PRIME, called uim-prime. Sinse PRIME predict what you want to input,
you need not to type all of text. PRIME is applicable not only for
Japanese input but also for English input.

Other
=====

Other types is difficult to explain. But I can say they are very minor.
Non-native Japanese speakers need not to know these very very minor
input methods.

t-code
======

I don't know how to represent t-code exactly. T-code is a way to
generate kanji character from key combinations rather than input
method.

For example, 'aa' generates '種' with t-code. Since it's not a real
input method, it is possible to use t-code and other input method.
Uim doesn't have a such implementation, but PRIME has.

Difference of t-code and tut-code is a table for conversion.

Regards,

-- 
TOKUNAGA Hiroyuki
tkng at xem.jp