[HarfBuzz] harfbuzz: Branch 'master' - 4 commits
Behdad Esfahbod
behdad at kemper.freedesktop.org
Wed Sep 5 14:54:17 PDT 2012
src/hb-ot-shape-complex-indic-machine.rl | 8 ++++----
test/shaping/hb_test_tools.py | 8 +++++---
2 files changed, 9 insertions(+), 7 deletions(-)
New commits:
commit f0b8ed1b6dd9f1d2b9084c101a6fc5dee0cc22a8
Author: Behdad Esfahbod <behdad at behdad.org>
Date: Wed Sep 5 17:32:57 2012 -0400
[Indic] Allow "H,ZWJ,M"
Uniscribe accepts a Halant,ZWJ before matras. Allow that.
BENGALI down from 295 to 291
DEVANAGARI down from 69 to 57
GUJARATI down from 19 to 17
KANNADA down from 871 to 867
MALAYALAM down from 340 to 337
TELUGU down from 20 to 16
Currently at:
BENGALI: 353897 out of 354188 tests passed. 291 failed (0.0821598%)
DEVANAGARI: 707337 out of 707394 tests passed. 57 failed (0.00805774%)
GUJARATI: 366440 out of 366457 tests passed. 17 failed (0.00463902%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951046 out of 951913 tests passed. 867 failed (0.0910798%)
KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047997 out of 1048334 tests passed. 337 failed (0.0321462%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970557 out of 970573 tests passed. 16 failed (0.00164851%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
diff --git a/src/hb-ot-shape-complex-indic-machine.rl b/src/hb-ot-shape-complex-indic-machine.rl
index bcc942c..03e3910 100644
--- a/src/hb-ot-shape-complex-indic-machine.rl
+++ b/src/hb-ot-shape-complex-indic-machine.rl
@@ -69,7 +69,7 @@ syllable_tail = (Coeng (cn|V))? (SM.ZWNJ?)? (VD VD?)?;
place_holder = NBSP | DOTTEDCIRCLE;
halant_group = (z?.h.(ZWJ.N?)?);
final_halant_group = halant_group | h.ZWNJ;
-halant_or_matra_group = (final_halant_group | matra_group{0,4});
+halant_or_matra_group = (final_halant_group | (h.ZWJ)? matra_group{0,4});
consonant_syllable = Repha? (cn.halant_group){0,4} cn A? halant_or_matra_group? syllable_tail;
commit 4ed717ef61813fa16cf74f2874848e9feb81568f
Author: Behdad Esfahbod <behdad at behdad.org>
Date: Wed Sep 5 17:21:17 2012 -0400
[Indic] Relax grammar
Now that we insert dotted-circle, tests break more easily when our indic
machine breaks.
In particular, a few Devanagari tests were having sequences like
"C,H,ZWJ,N", and because of the ZWJ the Nukta does NOT get reordered to
before the Halant as the grammar used to expect... Fixup.
Another case is as simple as "C,ZWJ,SM".
Fixes 10 out of 79 failures:
DEVANAGARI: 707325 out of 707394 tests passed. 69 failed (0.00975411%)
diff --git a/src/hb-ot-shape-complex-indic-machine.rl b/src/hb-ot-shape-complex-indic-machine.rl
index 283a246..bcc942c 100644
--- a/src/hb-ot-shape-complex-indic-machine.rl
+++ b/src/hb-ot-shape-complex-indic-machine.rl
@@ -62,12 +62,12 @@ z = ZWJ|ZWNJ; # is_joiner
h = H | Coeng; # is_halant_or_coeng
reph = (Ra H | Repha); # possible reph
-cn = c.n?;
+cn = c.ZWJ?.n?;
forced_rakar = ZWJ H ZWJ Ra;
matra_group = z{0,3}.M.N?.(H | forced_rakar)?;
syllable_tail = (Coeng (cn|V))? (SM.ZWNJ?)? (VD VD?)?;
place_holder = NBSP | DOTTEDCIRCLE;
-halant_group = (z?.h.ZWJ?);
+halant_group = (z?.h.(ZWJ.N?)?);
final_halant_group = halant_group | h.ZWNJ;
halant_or_matra_group = (final_halant_group | matra_group{0,4});
commit aa7141efe49991a1160489106984e95163fe2ab8
Author: Behdad Esfahbod <behdad at behdad.org>
Date: Wed Sep 5 15:54:21 2012 -0400
[Indic] Fix Khmer syllable-final coeng-consonant
Brings down Khmer failures from 162 to 47.
KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)
Also rebaselined some of the test files that had only-inherited lines.
Removing those, the stats are:
BENGALI: 353893 out of 354188 tests passed. 295 failed (0.0832891%)
DEVANAGARI: 707315 out of 707394 tests passed. 79 failed (0.0111678%)
GUJARATI: 366438 out of 366457 tests passed. 19 failed (0.00518478%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951042 out of 951913 tests passed. 871 failed (0.0915%)
KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047994 out of 1048334 tests passed. 340 failed (0.0324324%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970553 out of 970573 tests passed. 20 failed (0.00206064%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
Still some regressions, but some of the more egregious cases are
addressed.
diff --git a/src/hb-ot-shape-complex-indic-machine.rl b/src/hb-ot-shape-complex-indic-machine.rl
index c9309e9..283a246 100644
--- a/src/hb-ot-shape-complex-indic-machine.rl
+++ b/src/hb-ot-shape-complex-indic-machine.rl
@@ -65,7 +65,7 @@ reph = (Ra H | Repha); # possible reph
cn = c.n?;
forced_rakar = ZWJ H ZWJ Ra;
matra_group = z{0,3}.M.N?.(H | forced_rakar)?;
-syllable_tail = (SM.ZWNJ?)? (Coeng (cn|V))? (VD VD?)?;
+syllable_tail = (Coeng (cn|V))? (SM.ZWNJ?)? (VD VD?)?;
place_holder = NBSP | DOTTEDCIRCLE;
halant_group = (z?.h.ZWJ?);
final_halant_group = halant_group | h.ZWNJ;
commit efb8d3eb713bca7cbfca41380a012bdb4d380e5c
Author: Behdad Esfahbod <behdad at behdad.org>
Date: Wed Sep 5 15:50:47 2012 -0400
Fixup test failure reporting
After we implemented dotted-circle, we were still ignoring any tests
that had dottedcircle in it for any of the shapers. That meant that if
we wrongly outputted dottedcircle, the test was being ignored. Ouch!
Fixing that shows regressions across the board. Most are Uniscribe
bugs: NOT inserting dotted-circle when it should. Some are arou
machine bugs. This is in fact a nice way to catch Indic-machine
deficiencies and when I fix the regressions, our clusters should be
much closer to Uniscribe. For now, we regressed from:
BENGALI: 353997 out of 354285 tests passed. 288 failed (0.0812905%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271747 out of 271847 tests passed. 100 failed (0.0367854%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
To:
BENGALI: 353990 out of 354285 tests passed. 295 failed (0.0832663%)
DEVANAGARI: 707315 out of 707394 tests passed. 79 failed (0.0111678%)
GUJARATI: 366447 out of 366506 tests passed. 59 failed (0.016098%)
GURMUKHI: 60707 out of 60809 tests passed. 102 failed (0.167738%)
KANNADA: 951042 out of 951913 tests passed. 871 failed (0.0915%)
KHMER: 298962 out of 299124 tests passed. 162 failed (0.0541581%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048074 out of 1048416 tests passed. 342 failed (0.0326206%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091835 out of 1091837 tests passed. 2 failed (0.000183178%)
TELUGU: 970553 out of 970573 tests passed. 20 failed (0.00206064%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
Investigating.
diff --git a/test/shaping/hb_test_tools.py b/test/shaping/hb_test_tools.py
index 1d1d62c..0b1ec00 100644
--- a/test/shaping/hb_test_tools.py
+++ b/test/shaping/hb_test_tools.py
@@ -295,9 +295,11 @@ class DiffHelpers:
def test_passed (lines):
lines = list (lines)
# XXX This is a hack, but does the job for now.
- if any (l.find("space|space") >= 0 for l in lines): return True
- if any (l.find("uni25CC") >= 0 for l in lines): return True
- if any (l.find("dottedcircle") >= 0 for l in lines): return True
+ if any (l.find("space|space") >= 0 for l in lines if l[0] == '+'): return True
+ if any (l.find("uni25CC") >= 0 for l in lines if l[0] == '+'): return True
+ if any (l.find("dottedcircle") >= 0 for l in lines if l[0] == '+'): return True
+ if any (l.find("glyph0") >= 0 for l in lines if l[0] == '+'): return True
+ if any (l.find("notdef") >= 0 for l in lines if l[0] == '+'): return True
return all (l[0] == ' ' for l in lines)
More information about the HarfBuzz
mailing list