[SCIM] Unsupported languages on Linux and SCIM

Mon Sep 6 04:01:51 PDT 2004

*This message was transferred with a trial version of CommuniGate(tm) Pro*

>I'm one of developpers of of m17n-lib.  At least vi-viqr
>input method is working well with an example program "medit"
>(a simple Unicode editor included in the distribution of
>m17n-lib).
>
>  
>
The problem turnd out yo be elswhere :-)

>It is fairly easy to add an input method for the m17n
>database (m17n-db) which is used by scim-m17n. 
>
Thank you for your explanations - they are most useful to me. I would 
like to ask you one more question, which is still not quite clear to me. 
It is not directly SCIM or IME related, but affects the way some IME 
problems can be solved.

Is Linux software capable of displaying directly Basic Multilingual 
Plane (plane 0) unicode characters? I did a test in console by running
$ perl -e 'print "\x{1200B}\n"'
and I got one square (suggesting one character to me). But if I redirect 
the character to a file and open it with KDE editors or OpenOffice, they 
display two squares. I understand that those applications did not read 
the unicode string correctly and split the code into two characters. For 
me it means, that those applications will not support characters encoded 
in BMP.
Please note that I may be totally wrong, as I do not know much about 
Linux yet. I am just looking for a way to answer my question and this is 
the way I tried. I just did not find a webpage on Internet explaining 
these things directly.

A few words that should shed some light on what I want to achieve.

As a subject for an excercise in SCIM programming I chose an 
implementation of cuneiform input method. It is perfect because IMO it 
is somwhere in the middle of the difficulty scale from the technical 
point of view (I think Egyptian hieroglyphics is the most challenging 
:-) ). It is also fun and may become useful for scholars.
To achieve the goal I have first to add Linux support for the script - I 
need to add language code stuff etc. How do I do that?
Next I need to create a font. Unicode chose BMP for cuneiform. Can 
displaying BMP characters be achieved in Linux applications? How?
If  it is not possible, I will shift the encoding to private use area, 
which I guess is supported (it is on Windows :-) )
Than I need SCIM input. The input of cuneiform will be similar to 
chinese pinyin: user will choose the language (akkadian, hittite, 
sumerian) and input pronunciation. Then he will choose the right 
candidate from the list. If there is only one candidate, it will be 
input to the application (the way japanese kana input works). Ideally 
there should be some switches in the input method for a given language 
(akkadian, for example) to set the "time" (I lack correct term) for the 
text - in different "times" the pronunciations of cuneiform characters 
changed, so setting it correctly can dramatically speed up input (at 
least this is someone I developed such solution for in MS Word macros 
told me).
In addition to cuneiform input I would like to add an inpot for the 
transcription of Middle East languages. It could be done by simple 
remapping the keyboard, but I do not want the user to learn strange 
keyboard layouts. I think VIQR approach is ideal for this purpose. Users 
will not input huge amount of text in transcription so the inefficiency 
of the method is not a problem. I think it is ideal for this purpose. 
This is why I want to understand how VIQR works.

Yestarday evening I read the SCIM header files and the introduction to 
the SCIM architechture provided by James in "design.zh_CN" document. I 
have some general picture of how to achieve my goals in SCIM (of course 
I will for sure have problems with compiling and testing the code - I 
did not program in C++ so far. And the makefile framework used in SCIM 
is absolutely new to me, though it looks standard for Linux folks :-( ). 
However, I have no idea yet how to correctly add locale for the language 
"cuneiform akkadian" in Linux :-), whether I will have correct tools to 
prepare the fonts etc.

Your comments and criticism of my approach would be most interesting and 
helpful for me.

Best regards,
David