[Uim] uim-py: Adding idioms to PY.scm
Jon Babcock
jon at kanji.com
Tue Mar 30 19:45:50 EEST 2004
Yukiko-san,
Sorry, I don't have the answers to your questions and I, too, look
forward to hearing from the developers regarding them.
I am very interesting in pursuing a similar project --- turning PY.scm
(or pyunihan.scm) into a practical IME for Chinese, combining both
traditional and simplified forms. Unfortunately, I will have no time to
work on this until the end of May. Until then, I can only chat about it.
Basically, there are two main approaches to narrowing down the list of
candidates: tones and multisyllabic entries. 'Tones' has the great
disadvantage that, for non-native Chinese writers, they are easy to
forget. (Older Chinese who didn't go to school in the last 30 or 40
years sometimes find them difficult to nail down, too.) And unless you
get them exactly right, their usefulness is greatly diminished.
Furthermore, many kanji/hanzi have more than one tone and there are
disagreements about what that tone is, at least with regard to the
lesser or rarely-used kanji. Nevertheless, for common single Chinese
syllables, tones can narrow the candidate list significantly. Maybe the
best method would allow tones for single syllables (and multisyllabic
entries where the syllables were separated by spaces (or something)) and
then use multisyllabic entires, without tone marking, for everything else.
Yukiko Bando wrote:
> 4. Is there a plan to implement a dictionary (something like Anthy)
> for uim-py in the near future? I wonder if I should wait for that
> rather than edit PY.scm manually. It'll be good for practice but
> time-consuming.
Note that the Hanyu Da Cidian (HDC) contains 347,426 multisyllabic
entries, although an entry system would start to be useful with as
little as 1% of that, I think. Some sort of automatic method of
inputting these multisyllable entries is needed. I have some ideas, but
haven't had time to try them out yet.
Jon
--
Jon Babcock <jon at kanji.com>
More information about the uim
mailing list