[HarfBuzz] Indic Test Suite :: "Indie"

Adam Twardoch list.adam at twardoch.com
Wed Aug 26 11:31:25 PDT 2009


Ed Trager wrote:
> (Recall that Pango will use Uniscribe
> on Windows, ATSUI/AAT on Mac, so all bases will be covered).

But isn't it the problem that ATSUI/AAT has not OpenType Layout support
for Indic? The old ATSUI framework does not support OpenType Layout
shaping for any complex scripts. The new CoreText framework introduced
in Mac OS X 10.5 has some rudimentary Arabic suport based on OpenType
Layout, but both frameworks do Indic shaping solely through AAT, and as
we all know, there is perhaps a handful of Indic AAT fonts.

It would be tremendously helpful if an Indic test suite (or even more
generally, an OpenType Layout test suite if you chose to build one)
allowed the user to choose between the system native layout engine, the
HarfBuzz/Pango layout engine and the ICU Layout engine.

There is an interesting OpenType Layout testing app written by Tal
Leming called FeatureProof:

http://code.typesupply.com/wiki/FeatureProof

The app is written in Python for Mac OS X (using the PyObjC bridge) and
uses an OpenType Layout engine written purely by Tal called Compositor:

http://code.typesupply.com/wiki/Compositor

Unfortunately, Compositor has many limitations, the major one is that it
does not support any complex script shaping at all (just the basic
GSUB/GPOS calls). However, I think it has some interesting UI concepts,
and an interesting way to implement test cases.

I'm cc'ing Tal in case he's interested.

Best,
Adam


Ed Trager wrote:
> Hi, Everyone,
> 
> I've started to put together a program (written in C++ and called
> "Indie" as in "Indie band", "Indie film", etc.) that can be easily
> customized to generate Indic test suite data.  I would like to get
> everyone's feedback to see if my approach is on track and to solicit
> additional ideas and help.  So please provide feedback:
> 
> (1) The heart of the program is a "Markup Language Reporter" (MLR)
> base class.  Derived non-virtual report classes include TEXTR, XMLR,
> XHTMLR, and JSONR.  This means that the same test data "report" can be
> produced in text, XML, XHTML, or JSON formats.  This is exceedingly
> convenient for both automated and human-based processing. :-)
> 
> (2) Secondly, the program takes advantage of the fact that the
> ordering of vowels and consonants across the Unicode blocks for the
> major Indic scripts is consistent.  The program has (or when finished
> will have) header files containing meta data for each script, and an
> important item of meta data in the header files is the "offset" value.
>  Script offsets are relative to Devanagari: the offset for Bengali,
> for example, is 0x0080.  So, for example, once you have:
> 
>      <hex>0x0915, 0x093f</hex>
>      <utf8>कि</utf8>
> 
> ... as a test case for testing Devanagari KA + I, you can just add the
> offset for Bengali (0x0080) to produce the equivalent test case for
> Bengali:
> 
>     <hex>0x0995, 0x09bf</hex>
>     <utf8>কি</utf8>
> 
> ... and obviously you can continue in this manner to cover all of the
> major Indic scripts.
> 
> There are of course differences among the Indic scripts -- some of you
> on this list hopefully know a lot more about this than I do!
> Therefore, I'm sure that for some specific scripts there will need to
> be specific tests that don't generalize across all other Indic
> scripts.  On the other hand, there also exist classes of test cases
> that *do* generalize across all the Indic scripts -- for example tests
> of dependent vowels, tests of the new ZWJ+HALANT behaviors, etc.
> 
> Below I provide an example of what the XML output for dependent vowels
> currently looks like.  NOTE that I don't yet have Pango or Cairo stuff
> in the program, so the "<glyphIds>" and "renderedImage" tags are
> empty.  As I develop the program further, I can either (i) add the
> necessary Pango-Cairo code directly in the program or (ii) have the
> program call Behdad's "PangoView" to get the glyphIDs and render PNG
> images.  (I'm leaning toward adding the Pango-Cairo calls directly
> into my program because I don't think it will be too much more work,
> but we'll see how things go).  (Recall that Pango will use Uniscribe
> on Windows, ATSUI/AAT on Mac, so all bases will be covered).
> 
> Anyway, here's what the XML currently looks like for the dependent
> vowel tests for Devanagari and Bengali.  In the future, the
> "renderedImage" tag would contain a PNG file name based on the test
> case ID, i.e., "case_11.png", "case_12.png" ... etc. :
> 
> ===============
> 
> <?xml version="1.0" encoding="UTF-8" ?>
> <report>
> <DEFANGED_scripts>
>  <DEFANGED_script>
>   <commonName>Devanagari</commonName>
>   <nativeName>देवनागरी</nativeName>
>   <dependentVowels>
>    <testCase>
>     <id>1</id>
>     <hex>0x0915</hex>
>     <utf8>क</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>2</id>
>     <hex>0x0915, 0x093e</hex>
>     <utf8>का</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>3</id>
>     <hex>0x0915, 0x093f</hex>
>     <utf8>कि</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>4</id>
>     <hex>0x0915, 0x0940</hex>
>     <utf8>की</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>5</id>
>     <hex>0x0915, 0x0941</hex>
>     <utf8>कु</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>6</id>
>     <hex>0x0915, 0x0942</hex>
>     <utf8>कू</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>7</id>
>     <hex>0x0915, 0x0943</hex>
>     <utf8>कृ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>8</id>
>     <hex>0x0915, 0x0944</hex>
>     <utf8>कॄ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>9</id>
>     <hex>0x0915, 0x0962</hex>
>     <utf8>कॢ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>10</id>
>     <hex>0x0915, 0x0963</hex>
>     <utf8>कॣ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>11</id>
>     <hex>0x0915, 0x0947</hex>
>     <utf8>के</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>12</id>
>     <hex>0x0915, 0x0948</hex>
>     <utf8>कै</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>13</id>
>     <hex>0x0915, 0x094b</hex>
>     <utf8>को</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>14</id>
>     <hex>0x0915, 0x094c</hex>
>     <utf8>कौ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>   </dependentVowels>
>  </script>
>  <DEFANGED_script>
>   <commonName>Bengali</commonName>
>   <nativeName>বাংলা</nativeName>
>   <dependentVowels>
>    <testCase>
>     <id>15</id>
>     <hex>0x0995</hex>
>     <utf8>ক</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>16</id>
>     <hex>0x0995, 0x09be</hex>
>     <utf8>কা</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>17</id>
>     <hex>0x0995, 0x09bf</hex>
>     <utf8>কি</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>18</id>
>     <hex>0x0995, 0x09c0</hex>
>     <utf8>কী</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>19</id>
>     <hex>0x0995, 0x09c1</hex>
>     <utf8>কু</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>20</id>
>     <hex>0x0995, 0x09c2</hex>
>     <utf8>কূ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>21</id>
>     <hex>0x0995, 0x09c3</hex>
>     <utf8>কৃ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>22</id>
>     <hex>0x0995, 0x09c4</hex>
>     <utf8>কৄ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>23</id>
>     <hex>0x0995, 0x09e2</hex>
>     <utf8>কৢ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>24</id>
>     <hex>0x0995, 0x09e3</hex>
>     <utf8>কৣ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>25</id>
>     <hex>0x0995, 0x09c7</hex>
>     <utf8>কে</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>26</id>
>     <hex>0x0995, 0x09c8</hex>
>     <utf8>কৈ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>27</id>
>     <hex>0x0995, 0x09cb</hex>
>     <utf8>কো</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>    <testCase>
>     <id>28</id>
>     <hex>0x0995, 0x09cc</hex>
>     <utf8>কৌ</utf8>
>     <glyphIds>...</glyphIds>
>     <renderedImage>...</renderedImage>
>    </testCase>
>   </dependentVowels>
>  </script>
> </scripts>
> </report>
> 
> =================
> 
> Finally, it should be obvious that this approach, once the Pango-Cairo
> stuff is incorporated one way or the other, should make it trivially
> easy to test and compare different fonts ("Mangal.ttf" on Windows
> Vista / 7, "lohit" for Linux, etc.)
> 
>  -- Ed
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
> 
> 


-- 

Adam Twardoch
| Language Typography Unicode Fonts OpenType
| twardoch.com | silesian.com | fontlab.net

The illegal we do immediately.
The unconstitutional takes a little longer.
(Henry Kissinger)



More information about the HarfBuzz mailing list