[utf-8] Aspell and UTF-8 Update

Noah Levitt nlevitt@columbia.edu
Sun, 22 Feb 2004 19:49:39 -0500


On Sun, Feb 22, 2004 at  5:19:15 -0500, Kevin Atkinson wrote:
> 
> Which languages, with a phonetic like alphabet, do you think that Aspell 
> 0.51 will NOT be able to handle do to one of the following?
> 
> 1) More than 220 distinct characters?

Not sure what a “phonetic like alphabet” is exactly, or why
aspell should be limited to such scripts, but I assume you
mean to exclude Han and Korean. In that case, it appears
that the only languages you won’t be able to support are
those that use Ethiopic script.

Noah

P.S. Out of curiosity, I did some counting on
     fontconfig/fc-lang/*.orth. These numbers are decent
     estimates of the number of word characters (i.e.
     excluding punctuation, digits, &c.) used in each
     language.

aa.orth: 60
ab.orth: 82
af.orth: 67
am.orth: 244
ar.orth: 124
ast.orth: 70
ava.orth: 67
ay.orth: 58
az_ir.orth: 121
az.orth: 145
bam.orth: 58
ba.orth: 81
be.orth: 67
bg.orth: 56
bho.orth: 65
bh.orth: 65
bin.orth: 76
bi.orth: 56
bn.orth: 77
bo.orth: 88
br.orth: 62
bs.orth: 60
bua.orth: 70
ca.orth: 72
ce.orth: 67
chm.orth: 76
ch.orth: 56
chr.orth: 84
co.orth: 82
cs.orth: 80
cu.orth: 94
cv.orth: 74
cy.orth: 76
da.orth: 68
de.orth: 57
dz.orth: 88
el.orth: 66
en.orth: 68
eo.orth: 56
es.orth: 64
et.orth: 62
eu.orth: 54
fa.orth: 120
fi.orth: 60
fj.orth: 50
fo.orth: 66
fr.orth: 82
ful.orth: 59
fur.orth: 62
fy.orth: 73
ga.orth: 78
gd.orth: 68
gez.orth: 194
gl.orth: 64
gn.orth: 68
gu.orth: 67
gv.orth: 52
ha.orth: 57
haw.orth: 56
he.orth: 26
hi.orth: 65
ho.orth: 50
hr.orth: 55
hu.orth: 66
hy.orth: 75
ia.orth: 50
ibo.orth: 56
id.orth: 52
ie.orth: 50
ik.orth: 68
io.orth: 50
is.orth: 68
it.orth: 66
iu.orth: 131
ja.orth: 6538
kaa.orth: 78
ka.orth: 40
ki.orth: 54
kk.orth: 70
kl.orth: 77
km.orth: 69
kn.orth: 68
kok.orth: 65
ko.orth: 16153
ks.orth: 65
ku_ir.orth: 26
kum.orth: 66
ku.orth: 60
kv.orth: 70
kw.orth: 56
ky.orth: 70
la.orth: 61
lb.orth: 73
lez.orth: 67
lo.orth: 53
lt.orth: 59
lv.orth: 63
mg.orth: 54
mh.orth: 60
mi.orth: 56
mk.orth: 38
ml.orth: 68
mn.orth: 125
mo.orth: 123
mr.orth: 65
mt.orth: 66
my.orth: 44
nb.orth: 68
ne.orth: 65
nl.orth: 80
nn.orth: 68
no.orth: 68
ny.orth: 51
oc.orth: 68
om.orth: 50
or.orth: 65
os.orth: 66
pl.orth: 60
ps_af.orth: 44
ps_pk.orth: 44
pt.orth: 80
rm.orth: 64
ro.orth: 58
ru.orth: 67
sah.orth: 76
sa.orth: 65
sco.orth: 53
sel.orth: 66
se.orth: 58
sh.orth: 75
si.orth: 70
sk.orth: 75
sl.orth: 60
sma.orth: 58
smj.orth: 58
smn.orth: 61
sm.orth: 51
sms.orth: 69
so.orth: 50
sq.orth: 54
sr.orth: 75
sv.orth: 66
sw.orth: 50
syr.orth: 43
ta.orth: 36
te.orth: 68
tg.orth: 78
th.orth: 85
ti_er.orth: 240
ti_et.orth: 262
tig.orth: 200
tk.orth: 74
tl.orth: 15
tn.orth: 54
to.orth: 51
tr.orth: 68
ts.orth: 50
tt.orth: 76
tw.orth: 71
tyv.orth: 70
ug.orth: 124
uk.orth: 71
ur.orth: 133
uz.orth: 68
ven.orth: 55
vi.orth: 174
vo.orth: 48
vot.orth: 58
wa.orth: 68
wen.orth: 63
wo.orth: 63
xh.orth: 50
yap.orth: 56
yi.orth: 26
yo.orth: 103
zh_cn.orth: 6765
zh_hk.orth: 2213
zh_mo.orth: 13063
zh_sg.orth: 6765
zh_tw.orth: 13063
zu.orth: 50