[Uim] uim-py: Adding idioms to PY.scm

Yukiko Bando ybando at k6.dion.ne.jp
Tue Apr 6 16:40:52 EEST 2004


On Sun, 04 Apr 2004 11:07:33 -0600, Jon Babcock <jon at kanji.com> wrote:

> You may want to add tsi.src which contains about 130,000 entries.The  
> first step is to convert the Zhuyin entries to Pinyin and then to  
> eliminate duplicates with your CEDICT file. I plan to do this  
> eventually, but would be *delighted* if I didn't have to. <g> And then  
> the file must be converted to UTF-8 Unicode, I assume.

Traditional Chinese characters look more familiar to me than simplified  
ones, but it seems impossible to convert Zhuyin to Pinyin properly without  
good knowledge about the language.  I just started learning the standard  
Chinese and would like to keep my PY.scm pure Mandarin to avoid confusion.  
;-)  I would rather improve it by rearranging words that appear in  
candidate lists as well as adding missing words one by one.  Sorry if I  
have disappointed you...  But if you send me a list of pinyin and  
corresponding words in a spreadsheet, I think I can convert it to  
TSI_PY.scm using my tool, which is probably the easiest part of your  
project though.

> Nevertheless, a professional C-E translator encounters many words that  
> are not included in any of the above 7 or 8 major dictionaries of  
> Chinese words. (Just check the daily traffic on the fanyi mailing list  
> for an endless stream of examples.)

The same is true for Japanese.  I often encounter unfamiliar words  
especially on the internet.  For good or bad, people continue to coin new  
words.

Yukiko





More information about the uim mailing list