[Uim] Japanese input

Thu Jul 21 16:27:44 EEST 2005

> I can see how it works for identifying word classes, I can't picture how it
> could do kana->kanji conversion though. The odds of any given kanji compound
> appearing after any other?
Good question!

> Very simply, for English:
> 
> Fruit flies like an apple
> noun verb conjuction noun
> noun noun verb noun

Consider similar problem to decode following alphabet sequence

`fruitflieslikeanapple'

The answer is obviously `fruit flies like an apple'
However, there is very small possibility to be
 (a) `fru it flis li kean app le',
 (b) `f r u i t f l i e s l i k e a n a p p l e'
or something like that.

Probability of (a) is
 P(fru,Noun) * P(fru,Noun->it,Noun) * P(it,Noun) * ...
and (b)
 P(f,Noun) * P(f,Noun->r,Noun) * P(r,Noun) * ...

Anyway, follwoing obvious pathes will get higer score
 P(fruit,Noun) * P(fruit,Noun->flies,Verb) ..
or
 P(fruit,Noun) * P(fruit,Noun->flies,Noun) ..

This is how anthy uses HMM and split kana sequence.

--
 CHAOS AND CHANCE!
  Yusuke TABATA (yusuke at cherubim.icw.co.jp)