[Uim] Korean input

Park Jae-hyeon jhpark at kias.re.kr
Thu Jul 21 18:37:53 EEST 2005


David Oftedal <david at start.no> writes:

> >1. SPACE does not complete a syllable, but it is just inserted.  For
> >   example, "b a b SPACE m eo g j a" translates to "밥 먹자."  A
> >   Korean person would find that this was more natural.  However,
> >   there are cases where the end of a syllable is ambiguous, and a key
> >   is needed to signal the end or beginning of a syllable.  HWP does
> >   not seem to have a key for this.  In my opinion, ' may be an
> >   option.  Similarly, RETURN is entered when it is pressed.  In
> >   general, if a key is not for a jamo, it is committed along with the
> >   syllable that was being composed.
> >  
> >
> Hmm. I've actually been unable to reproduce these problems on my system.
> Both space and enter function as commit keys as long as there's anything
> at all in the preedit. These issues could arise if you call romaja.scm
> from loader.scm instead of hangul.scm, since all the rules for its
> behavior, including the behavior of the space and enter keys, are
> defined in hangul.scm.

My notation was inconsistent.  I should have said that HWP translated
"b a b SPACE m eo g j a" to "밥 SPACE 먹 자".  romaja.scm translates
"b a b SPACE m eo g j a" to "밥 먹 자".  Here, ignore all blanks
inside double quotes.  romaja.scm regards the SPACE as the commit key,
and it commits "밥", but it does not commit SPACE itself.  Therefore,
using romaja.scm, one should press SPACE twice after "b a b" to insert
a space after "밥", while one presses SPACE only once with HWP.  This
means that SPACE, RETURN, ESC, or any key that cannot be interpreted
as a Korean jamo, commits itself as well as the preedit string.  HWP
works in this way because it is the usual way a Korean keyboard (such
as 2-beolsik) works.  Although HWP's behavior appears natural to
Koreans, romaja.scm's behavior may look more natural to a foreigner
who, for example, is familiar with a Japanese input method.  IMHO, an
explicit commit key for Korean input is redundant.  The reason for
having an explicit commit key for a Japanese input method, I think, is
that it has to perform kana-kanji conversion, which is rarely needed
for Korean.

I replaced existing /usr/share/uim/romaja.scm with your version, and
did not touch /usr/share/uim/loader.scm of uim 0.4.6final1 on Debian
etch.  The part of loader.scm responsible for loading romaja.scm looks
like:

(if (memq 'romaja enabled-im-list)
    (if enable-lazy-loading?
        (register-stub-im
         'romaja
         "ko"
         "UTF-8"
         "Hangul (Romaja)"
         "Romaja input style hangul input method"
         "hangul")
        (require-module "hangul")))

> >2. A few different latin representations are accepted for a single
> >   jamo.  For example, "ㄹ" can be entered as "r" or "l", and "ㅙ" can
> >   be entered as "wae", "uae", or "oae".  I think an input method
> >   should be as generous as possible provided that it does not cause
> >   further ambiguities.
> >  
> >
> This is an interesting issue which I wish I'd given more consideration.
> The old version actually contained some duplicated entries to allow the
> user to enter either "r" or "l", but when I generated the new romaja.scm
> table, I simply figured that "ㄹ" should become "r" before any vowel and
> "l" otherwhise. In turn that means that "신라" will have to be entered
> as "sinra", and can't be entered as "sinla" or "shinra" or "shinla". It
> will require a substantial amount of extra entries to add these changes,
> but I have the tools to make the necessary changes available, so it
> should be relatively simple.
> 
> Unfortunately, though, I hardly speak a word of Korean, so I can only
> add the equivalent forms that I know of... Which currently amounts to
> "ua" and "oa" for "wa", "shi" for "si", and "r" for "l". I'll try to
> read up on it before I make the changes, though, and I'm always open to
> suggestions. Thanks for your input! :)

FYI, let me enumerate the latin letter assignments used in HWP.

ㄱ  g
ㄲ  gg, kk, qq, c (not cc)
ㄴ  n
ㄷ  d
ㄸ  dd, tt
ㄹ  r, l
ㅁ  m
ㅂ  b, v
ㅃ  bb, pp, ff, vv
ㅅ  s
ㅆ  ss
ㅇ  (optional) x
ㅈ  j, z
ㅉ  jj, zz
ㅊ  ch
ㅋ  k, q
ㅌ  t
ㅍ  p, f
ㅎ  h

ㅏ  a
ㅐ  ae
ㅑ  ya, ia
ㅒ  yae, iae
ㅓ  eo
ㅔ  e
ㅕ  yeo, ieo
ㅖ  ye, ie
ㅗ  o
ㅘ  wa, ua, oa
ㅙ  wae, uae, oae
ㅚ  woe, uoe, oi
ㅛ  yo, io
ㅜ  u, w, oo
ㅝ  wo, uo
ㅞ  we, ue
ㅟ  wi
ㅠ  yu, iu
ㅡ  eu
ㅢ  ui, eui
ㅣ  i, y, ee



More information about the uim mailing list