[Uim] traditional Chinese py input?

Jon Babcock jon at kanji.com
Tue Jan 13 03:04:28 EET 2004


David Oftedal wrote:
> Ah, so traditional and simplified characters should generally be kept
> in the same file?

So far, there has been a strong tendency to keep all things related to
each language (Chinese simplified, Chinese traditional, Korean, and
Japanese) quite separate. Most texts from current CJK don't use a
mixture. They are content to restrict themselves to kanji/hanzi/hanja
from character sets that have been selected for use within their own
respective milieu.

Actually, there has been some opposition to Unicode's attempt to assign
one code point to characters that have two, three or more glyph
variants, depending on time (historical) and mostly place (China, Japan,
N. Korea, S. Korea, Taiwan, Hong Kong, Singapore, Vietnam, etc.). The
arguments against doing so was strong enough to curtail any full-scale
Han unification effort. This is especially true of the simplified glyphs
currently used in China. Generally, they were assigned their own code
points, even though strictly speaking nearly all of them were merely
variations of the "same character" and thereby contradicted the general
guiding principle of Unicode. But there were very strong practical (and
political, social, cultural) reasons for this and the result of
Unicode's effort at making everyone happy, I think, is not bad.

BUT, from the perspective of someone such as myself, who pretends to
study kanji, and would like to include various forms of the glyphs in a
single written document, the best input method would be provide all
options, regardless whether these glyphs were usually considered to
"belong to" Japanese, Chinese (simplified or traditional) or Korean. If
I were writing *in* Chinese (either one) or Japanese, then I would not
want or need all these candidates; I would probably consider them
superfluous. So one input method that offers all the variations of the
"same character" that exist in Unicode would be nice. And I'm willing to
do some of the grunt work to make this a reality.

> Now that you mention it, I've heard that most texts use a mixture of
> both, and that the distinction isn't actually as clear as one would
> think.

> Anyway, here's the file in UTF-8 (I think):

Thanks! I'll start playing with this.

> Now that you mention kanji, does anyone know of some tables for
> creating hanja? It would be great if we could parse that and put it
> into an scm file. For instance, it could depend on hangul.scm, so
> that one could input the hangul first and convert them later.

Well, in theory, the characters within the "CJK Unified Ideographs" of
Unicode include all those needed for Korean.

If there is switching to be done, perhaps it should be simply between 
the different ways that the Unihan characters can be read/pronounced, 
i.e., in py, in ON, in KUN, in the Korean way. Then one could access 
"all" the kanji with whatever reading happened to be most convenient.

<aside>It may be useful to recall that not too long ago (a century or
so) the differences in the glyphs used to represent the Chinese
characters throughout the entire 漢字文化, "kanji culture", realm were
insignificant, and the number of new characters added 国字, "kokuji", in 
Japan, for example, numbered only a couple hundred or so. (Only about 20 
of these are still commonly used, afaik.)</aside>

Jon

-- 
Jon Babcock <jonatkanjidotcom>




More information about the uim mailing list