[SCIM] How to create a new IME on Linux in about 15 minutes with SCIM and m17n

David david at plm11.pl
Mon Oct 4 23:54:46 UTC 2004


*This message was transferred with a trial version of CommuniGate(tm) Pro*
I would like to share with everybody the result of my test of making new 
input method with m17n and using it with SCIM.
This was said before on this list, but not so clearly and I was still a 
little confused. I think bringing it all together will help.

It is so simple, that you can really have a new input method in a matter 
of minutes. But, even though so simple, this method can be very useful.

In the near future I will need to input some IPA (International Phonetic 
Alphabet). I decided it would be useful to have a simple way to write 
IPA with standard English keyboard. There is a standard for 
transliterating IPA with ASCII symbols - X-SAMPA. Let's create x-sampa IME.

I would like to quote Mr Kenichi Handa, who explained a few days ago how 
to add new input method to m17n:

BEGIN QUOTE

It is fairly easy to add an input method for the m17n
database (m17n-db) which is used by scim-m17n.  (...)

This is the main page of the m17n-lib and m17n-db:
	http://www.m17n.org/m17n-lib/

This page describes the format of input method data:
	http://www.m17n.org/m17n-lib/m17n-docs/m17nDBFormat.html#mdbIM

This page provides short explanations for each input method:
	http://www.m17n.org/m17n-lib/m17n-docs/m17nDBData.html#mim-list

When you create a new XXX.mim file, put it under the
directory where the m17n-db is installed (ususally,
/usr/share/m17n or /usr/local/share/m17n).  Edit the file
mdb.dir in the same directory and add this line:
(input-method LL NAME "XXX.mim)
LL is a two letter language code of ISO639-1 (e.g. "vi" for
Vietnamese), NAME to identify the input method.

That's all.

END QUOTE

This is really all you need to know to successfuly  add new input 
method. It is enough to write correct *.mim file, register it in mdb.dir 
and voila...
er... almost.  I had to reboot twice, before it really worked.  Read on.

I guess you noticed the words "correct *.mim file".  If you make a 
mistake, SCIM will not load your new input table. How will you know, 
whether it is some misconfiguration or your error? How you can check 
whether your IME can be loaded?

I did not know that either. To find out I made a copy of a few lines 
from James Su code (scim_m17n_imengine.cpp) responsible for loading 
m17lib IMEs. Please copy these lines to a file , let's call it 
"imelist.cpp":

-- BEGIN CUT --
#include <iostream>
#include <m17n.h>

int main()
{
    MPlist *imlist, *elm;
    MSymbol utf8 = msymbol("utf8");
    M17N_INIT();
    imlist = mdatabase_list(msymbol("input-method"), Mnil, Mnil, Mnil);
    for (elm = imlist; elm && mplist_key(elm) != Mnil; elm = 
mplist_next(elm)) {
        MDatabase *mdb = (MDatabase *) mplist_value(elm);
        MSymbol *tag = mdatabase_tag(mdb);
        if (tag[1] != Mnil) {
            MInputMethod *im = minput_open_im(tag[1], tag[2], NULL);
            if (im) {
                std::cout << msymbol_name (im->language);
                std::cout << "-";
                std::cout << msymbol_name (im->name);
                std::cout << "\n";
            }
        }
    }
    M17N_FINI();
}
-- END CUT --

Compile it with "g++ -lm17n -o imelist imelist.cpp".

Now, if you run the program "imelist", it will list all the imes that 
SCIM will load. If you make a mistake in your table, it will not get 
listed, so you know you have to correct it.

Let us continue our example. I decided that my input will be called x-sampa.

So, we create a file for new input:
cd /usr/share/m17n
touch x-sampa.mim

... and register it in the database. This is how it looks in "mdb.dir" 
in my case:
(input-method t x-sampa "x-sampa.mim")

Now we add a few entries to the table. It looks like this at the moment:

-- BEGIN CUT --

;;; <li> x-sampa.mim
;;;
;;; Input method for IPA script with ASCII letters.

(title "x-sampa")

(map
  (trans
 
("&" "æ")
("C" "ç")
("D" "ð")
("2" "ø")
("X\\" "ħ")
("N" "ŋ")
("9" "œ")
("B" "ƀ")
("p_<" "ƥ")
("t_j" "ƫ")
("t_<" "ƭ")
("dz)" "ƻ")
("|\\" "ǀ")
("|\\|\\" "ǁ")
("=\\" "ǂ")
("!\\" "ǃ")
   ))

(state
  (init
    (trans)))

-- END CUT --

Learn from the files in m17n directory.

Well, lets check:
./imelist

(...)
zh-py
t-latin-post
t-unicode
t-x-sampa


OK, it's listed. Now, in my case even after reboot I could not use it in 
skim. It was not listed in the "Other" group. I had to go to 
configuration panel, (it was listed there) disable it, save, and 
re-enable it. It still did not work. I closed the skim which immediately 
caused gtk panell to start. GTK configuration panel did not list x-sampa 
at all. I rebooted again, and this time everything was OK. I could 
select my new IME and write a few "words" with it. It works, and 
creating it was simple.

I think this description lacks a lot, I probabely made mistakes, but I 
was looking for something like this on the net and did not find 
anything. I believe it can help many people, as (you can see for 
yourself) you do not have to be a programmer to be able to quickly 
create quite sofisticated IMEs (even like Chinese or Korean).

Best regards,
David

PS: Critical comments most welcome :-)



More information about the scim mailing list