[Uim] Japanese input
yusuke at cherubim.icw.co.jp
yusuke at cherubim.icw.co.jp
Thu Jul 21 16:27:44 EEST 2005
> I can see how it works for identifying word classes, I can't picture how it
> could do kana->kanji conversion though. The odds of any given kanji compound
> appearing after any other?
Good question!
> Very simply, for English:
>
> Fruit flies like an apple
> noun verb conjuction noun
> noun noun verb noun
Consider similar problem to decode following alphabet sequence
`fruitflieslikeanapple'
The answer is obviously `fruit flies like an apple'
However, there is very small possibility to be
(a) `fru it flis li kean app le',
(b) `f r u i t f l i e s l i k e a n a p p l e'
or something like that.
Probability of (a) is
P(fru,Noun) * P(fru,Noun->it,Noun) * P(it,Noun) * ...
and (b)
P(f,Noun) * P(f,Noun->r,Noun) * P(r,Noun) * ...
Anyway, follwoing obvious pathes will get higer score
P(fruit,Noun) * P(fruit,Noun->flies,Verb) ..
or
P(fruit,Noun) * P(fruit,Noun->flies,Noun) ..
This is how anthy uses HMM and split kana sequence.
--
CHAOS AND CHANCE!
Yusuke TABATA (yusuke at cherubim.icw.co.jp)
More information about the uim
mailing list