[Uim] Japanese input

Paul TBBle Hampson Paul.Hampson at anu.edu.au
Thu Jul 21 12:16:22 EEST 2005


On Tue, Jul 19, 2005 at 10:14:40PM +0900, TOKUNAGA Hiroyuki wrote:

> This conversion algorithm is very complex and incomplete. For example,
> Anthy is using Hidden Marcov Model to determine most probable word
> class. I can't understand what is Hidden marcov Model.;-)

If I recall correctly, a Hidden Marcov Model works by classifying things in
some way and stores a probability of any class following any other.

It then runs through the sentence, recording the various possibilities of each
possible transition, and at the end multiplies each possible path's
probabilities together and takes the most likely.

I can see how it works for identifying word classes, I can't picture how it
could do kana->kanji conversion though. The odds of any given kanji compound
appearing after any other?

I had a look at Wikipedia, but it's article wasn't very clear to me, so I'll
try an example.

Very simply, for English:

Fruit flies like an apple
noun verb conjuction noun
noun noun verb noun

So that's p(start with n)*p(n->v) * p(v->c) * p(c->n) * p(end with n)
VS p(start with n) * p(n->n) * p(n->v) * p(v->n) * p(end with n)

I can't actually remember off hand which one turns out to be higher. ^_^

I dunno if that makes sense, it's been a while since I look at
these, ever so briefly.

-- 
-----------------------------------------------------------
Paul "TBBle" Hampson, MCSE
8th year CompSci/Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson at Anu.edu.au

"No survivors? Then where do the stories come from I wonder?"
-- Capt. Jack Sparrow, "Pirates of the Caribbean"

License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/uim/attachments/20050721/e1355a68/attachment.pgp 


More information about the uim mailing list