[Uim] uim-py: Adding idioms to PY.scm

Jon Babcock jon at kanji.com
Tue Mar 30 19:45:50 EEST 2004


Sorry, I don't have the answers to your questions and I, too, look 
forward to hearing from the developers regarding them.

I am very interesting in pursuing a similar project --- turning PY.scm 
(or pyunihan.scm) into a practical IME for Chinese, combining both 
traditional and simplified forms. Unfortunately, I will have no time to 
work on this until the end of May. Until then, I can only chat about it.

Basically, there are two main approaches to narrowing down the list of 
candidates: tones and multisyllabic entries. 'Tones' has the great 
disadvantage that, for non-native Chinese writers, they are easy to 
forget. (Older Chinese who didn't go to school in the last 30 or 40 
years sometimes find them difficult to nail down, too.) And unless you 
get them exactly right, their usefulness is greatly diminished. 
Furthermore, many kanji/hanzi have more than one tone and there are 
disagreements about what that tone is, at least with regard to the 
lesser or rarely-used kanji. Nevertheless, for common single Chinese 
syllables, tones can narrow the candidate list significantly. Maybe the 
best method would allow tones for single syllables (and multisyllabic 
entries where the syllables were separated by spaces (or something)) and 
then use multisyllabic entires,  without tone marking, for everything else.

Yukiko Bando wrote:

> 4.  Is there a plan to implement a dictionary (something like Anthy) 
> for  uim-py in the near future?  I wonder if I should wait for that 
> rather than  edit PY.scm manually.  It'll be good for practice but 
> time-consuming.

Note that the Hanyu Da Cidian (HDC) contains 347,426 multisyllabic 
entries, although an entry system would start to be useful with as 
little as 1% of that, I think. Some sort of automatic method of 
inputting these multisyllable entries is needed. I have some ideas, but 
haven't had time to try them out yet.


Jon Babcock <jon at kanji.com>

More information about the uim mailing list