[HarfBuzz] ANN: Wikipedia test data for testing HarfBuzz
Anish Patil
apatil at redhat.com
Tue Jul 3 23:26:31 PDT 2012
Hi Behdad,
>>As promised, here is the word-list data extracted from various language
>>Wikipedias, ready for public consumption.
Congratulations !!!
>>There are 63 languages included. Chinese and Japanese (zh and ja) are
>>intentionally left out as they were too big / not so interesting. Other than
>>that, English is particularly large, as expected, and the rest vary in size,
>>from a few thousand to tens of millions of unique words.
For some of the indian languages wiki pedia words contain spelling mistakes, hope that will not affect your work.
Marathi Word list contains words like "अॅक्सेसदिनांक",अॅरिझोना which are incorrect.
Cheers,
Anish P.
More information about the HarfBuzz
mailing list