[HarfBuzz] Harfbuzz Indic Testing Team Project -- Coordination and Goals

Ed Trager ed.trager at gmail.com
Thu Sep 10 13:13:24 PDT 2009


Hi, Everyone,

*INDIC TEAM COORDINATION AND PROJECT GOALS
*---------------------------------------------------

Unless I here objections otherwise, I volunteer to serve as the Harfbuzz
Indic Team Testing Coordinator: "Ed Trager" <ed.trager at gmail.com>

The goal of the project (and my role as coordinator as I see it) is to
facilitate the organization and compilation of comprehensive and accurate
test data for vetting Indic shaping engines and fonts.

The resulting test data will be made available as a public web resource for
developer, peer, and public review.  In addition to providing the data in a
web-friendly format useful for human review and vetting (*see Devanagari
draft sample currently at http://eyegene.ophthy.med.umich.edu/indic/*), the
data will also be downloadable in scripting-friendly formats (XML, JSON)
useful for automated testing.

To facilitate these goals, I have already written draft testing framework
software (called "Indie") suitable for preparing data reports in required
and otherwise useful formats (XHTML, XML, JSON, *inter alia*).  The software
facilitates the organization of test cases into logical groups (*reph,
rakaar, nukta, pres, abvs*, *etc.*), and will document test conditions and
inputs (*shaping engine, script, font, test case, description, unicode
string values, UTF8 string*) and outputs (*PNG image of the shaped text,
glyph ids, x offsets, y offsets*,* and other glyph geometry data as
requested*).  The software will be released under an Open Source license.  *The
draft software doesn't yet quite do all those things but I am working on it
...*

*NEAR-TERM GOALS
*-----------------------

The near-term goal of the project will be to support the development and
testing of Indic script layout in the Harfbuzz OpenType layout engine for
the Free Desktop. The near-term focus will be to compile test data for 10
major Indic scripts: *Gurmukhi, Devanagari, Gujarati, Bengali, Oriya,
Telugu, Kannada, Tamil, Malayalam, and Sinhala*.

To an extent, a portion of these data already exist in various forms, but
the existing data are scattered among different projects, incomplete,
difficult to find, and lack new test cases reflecting the most recent
changes in OpenType technology for Indic rendering (per Unicode Consortium
Public Review Issue #37).  The goal is therefore to compile all the data in
one place where anyone who wants it for any purpose can obtain it easily.

This will be "Phase One" of the project.

*LONG-TERM GOALS
*-----------------------

One primary long-term goal, after completion of "Phase One", will be to
include additional Indic and Indic-derived scripts already encoded in
Unicode, or soon-to-be-encoded in Unicode.  This will include scripts such
as *Lepcha, Balinese, Tai Le, and New Tai Lue, inter alia*.  To the best of
my knowledge, many of these additional Indic- and Indic-derived scripts
remain currently unsupported by the any of the major operating systems
(Windows, OSX, Linux).  We want to facilitate changing that status quo.

There is of course no restriction on the use of the test data sets. One
"natural" goal is to facilitate support for additional Open Source fonts for
existing and new Unicode scripts.  Another natural long term goal is to
support continued development of other layout engines, such as the Open
Source Graphite engine.  Note that the Indie testing framework is not
restricted to Indic and Indic-derived scripts: it can be just as easily used
to test scripts such as extended Arabic used for Uyghur (*inter alia*).

Using the data sets to test proprietary engines such as Uniscribe and OSX's
ATSUI/AAT is also possible of course.  For example, it appears that Windows
7 will ship with fonts for Tai Le and New Tai Lue (*this may be due to
requirements imposed by Mainland China for the sale of computer systems in
that country*).  It will be interesting to test OpenType support for scripts
such as New Tai Lue on any platform that supports it.

*CURRENT TEAM COMPOSITION
*--------------------------------------

Following up on Pravin and Harshula's lead, we still unfortunately have
empty slots to fill for team leads and members for most of the scripts :-(.
*If you have good knowledge of one or more of these scripts, please identify
yourselves :-).*

The job of individual script team leads and members will be to help find,
compile, and vet test data.  Once I get the Indie testing framework software
into SVN (*in a week or up to a month's time, depending on how busy I am ...
*), people will be able to add and edit test cases directly.  In the
meanwhile, you can coordinate with me and either provide or point out where
I can locate electronic resources of the data (*For example, Pravin provided
a draft PDF document for Devanagari*).

Below is what I have so far for each script:

*NORTHERN SCRIPTS
*-------------------------

*Gurmukhi:
*TEAM: "Gurmukhi (Punjabi) Team Lead - A S Alam" <apreet.alam at gmail.com>,
SUGGESTED WIN7 FONT: raavi.ttf
SUGGESTED OPEN SOURCE FONT: saab font from http://guca.sourceforge.net/
TEST DATA: ?

*Devanagari:*
TEAM:
   "Devanagari Team Lead - प्रविण सातपुते" <pravin.d.s at gmail.com>
   "Devanagari Team - G Karunakar" <indlinux at gmail.com>
SUGGESTED WIN7 FONT: mangal.ttf
SUGGESTED OPEN SOURCE FONT: Chandas (v. 1.3+, GPL)
TEST DATA:
  * Some draft test cases put online at
http://eyegene.ophthy.med.umich.edu/indic/ as an example
  * List of practical two consonant conjuncts
http://indlinux.sourceforge.net/tdata/dev/conjuncts-hi.txt
with some example words
http://www.indlinux.org/wiki/index.php/Hindi_Conjuncts
  *Sample text containing most of Devanagari characters (95% of current
Unicode devanagari range)
http://indlinux.sourceforge.net/tdata/dev/mahabharat.txt


*Gujarati:*
TEAM: ?
SUGGESTED WIN7 FONT: shruti.ttf
SUGGESTED OPEN SOURCE FONT: rekha.ttf from Utkarsh.org
TEST DATA: ?

*Bengali:*
TEAM: ?
SUGGESTED WIN7 FONT: Shonar.ttf
SUGGESTED OPEN SOURCE FONT: ekushey.org's SolaimanLipi font
TEST DATA: ?

*Oriya:*
TEAM: ?
SUGGESTED WIN7 FONT: kalinga.ttf
SUGGESTED OPEN SOURCE FONT: oriya.sarovar.org's utkal.ttf
TEST DATA: ?

*SOUTHERN SCRIPTS
*-------------------------

*Telugu:*
TEAM: ?
SUGGESTED WIN7 FONT: gautami.ttf
SUGGESTED OPEN SOURCE FONT:
http://www.kavya-nandanam.com/dload.htmPothana2000 font
TEST DATA: ?

*Kannada:*
TEAM: ?
SUGGESTED WIN7 FONT: tunga.ttf
SUGGESTED OPEN SOURCE FONT: Mallige or Kedage font from
brahmi.sourceforge.net
TEST DATA: ?

*Tamil:*
TEAM: ?
SUGGESTED WIN7 FONT: latha.ttf
SUGGESTED OPEN SOURCE FONT: ??? -- Need to check tamilnation.org's list (but
a lot of those are incomplete or old fonts)
TEST DATA: ?

*Malayalam:*
TEAM: "Malayalam Team Lead - Hiran V." <hiran.v at gmail.com>
SUGGESTED WIN7 FONT: kartika.ttf
SUGGESTED OPEN SOURCE FONT: SMC's Meera_04.ttf  http://smc.sarovar.org/
http://mirror.its.uidaho.edu/pub/savannah/smc/fonts/
TEST DATA:
  * Some at http://www.indlinux.org/wiki/index.php/Test_Data/Malayalam

*Sinhala:*
TEAM: "Sinhala Team Lead - Harshula" <harshula at gmail.com>
SUGGESTED WIN7 FONT: iskpota.ttf
SUGGESTED OPEN SOURCE FONT: Ask Harshula.
http://www.nongnu.org/sinhala/doc/howto/sinhala-howto.html
TEST DATA: ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20090910/3c8bb450/attachment.html>


More information about the HarfBuzz mailing list