<div dir="ltr">For my own project, I needed to implement mapping from IETF language tags to OpenType language system tags. I ended up writing some code to generate the mapping and then comparing the results with HarfBuzz. For each case where there was a discrepancy, I did enough research to convince myself of the right result. The HB source refers to a recent Microsoft draft, from which some entries have been added; I skipped these entries (which I assume are similar to the ones in the ISO 3rd ed WD 5, which I found here <a href="http://mpeg.chiariglione.org/standards/mpeg-4/open-font-format/text-wd-isoiec-14496-22-3rd-edition">http://mpeg.chiariglione.org/standards/mpeg-4/open-font-format/text-wd-isoiec-14496-22-3rd-edition</a>).<div>
<br></div><div>I documented the research here <div><br></div><div><div><a href="https://github.com/jclark/lang-ietf-opentype/blob/master/doc/notes.md">https://github.com/jclark/lang-ietf-opentype/blob/master/doc/notes.md</a></div>
</div><div><br></div><div>As a result I have a lot of comments about HarfBuzz's implementation.<br></div><div><br></div><div>First some stuff that is just typos.<br><div><br></div><div>"ber" should be mapped to BBR not BER.<div>
<br></div><div>There's a duplicate entry for "hz" not in sort order.</div><div><br></div><div>The entries for "sck", "vls", "wo" are not in sort order.</div><div><br></div><div>
The tag for "tmh" is in lower case instead of upper case.</div><div><br></div><div>Some tags are missing a final zero. The ISO WD adds some 4-character tags, whose last character is a zero. There are four cases where these have been added, but the final zero was incorrectly omitted: kab -> KAB0, ksh -> KSH0, kg -> KON0, pap -> PAP0, sn -> SNA0.</div>
<div><br></div><div>The following entries appear in the spec, but are missing from HarfBuzz, and they seem uncontroversial to me.</div><div><br></div><div>wlc CMR Mwali Comorian</div><div>wni CMR Ndzwani Comorian</div><div>
zdj CMR Ngazidja Comorian</div><div>caf CRR Southern Carrier</div><div>co COS Corsican</div><div><br></div><div>The last is probably missing because it was omitted from the ISO WD; I suspect this is a bug in the ISO WD.</div>
<div><br></div><div>HarfBuzz (and the OT spec) are inconsistent in their handling of macrolanguages. Sometimes when an IETF macrolanguage is mapped to an OT lang, they also map the individual languages encompassed by the macrolanguage to that OT tag and sometimes they don't. I would suggest that the consistent and reasonable policy is always to map the individual languages to the same OT tag as the macrolanguage, unless the individual language is separately mapped to a more specific OT tag. I created a file with the additional entries that would be needed to implement this policy in HarfBuzz:</div>
<div><br></div><div><a href="https://github.com/jclark/lang-ietf-opentype/blob/master/gen/hb-macrolang-expand.txt">https://github.com/jclark/lang-ietf-opentype/blob/master/gen/hb-macrolang-expand.txt</a></div><div><br></div>
<div>The rest of my comments are not self-evident. You will need to refer to the notes I linked to above for my reasoning.</div><div><br></div><div>My first set of removal/additions is in accordance with the ISO 639 codes in the spec. I suggest removing these mappings:</div>
<div><div><br></div><div>eot BTI Beti (Côte d'Ivoire)</div><div>kvd KUI Kui (Indonesia)</div><div>mdc MLE Male (Papua New Guinea)</div><div>mlq MNK Western Maninkakan</div><div>nco SIB Sibe</div><div>ril RIA Riang (India)</div>
<div>xom KMO Komo (Sudan)</div><div>yso NIS Nisi (China)</div><div><br></div><div>and adding these:</div><div><br></div><div>sjo SIB Xibe</div><div>pro PRO Old Provencal</div><div>rmz ARK Marma</div><div><br></div><div>The next set is not in the spec. Remove:</div>
<div><br></div><div>xst SIG (not an IETF tag, was Silt'e in ISO 639-2 before it was retired)</div><div><br></div><div>and add:</div><div><br></div><div>njz NIS Nyishi</div><div>tgj NIS Tagin</div><div>beb BTI Bebele</div>
<div>bum BTI Bulu (Cameroon)</div><div>bxp BTI Bebil</div><div>eto BTI Eton (Cameroon)</div><div>ewo BTI Ewondo</div><div>fan BTI Fang (Equatorial Guinea)</div><div>mct BTI Mengisa</div></div><div><br></div><div>Finally I have suggestions the commented out entries in the source:</div>
</div><div><br></div><div><div>/*{"ahg/awn/xan?",<span class="" style="white-space:pre"> </span>HB_TAG('A','G','W',' ')},*/<span class="" style="white-space:pre"> </span>/* Agaw */</div>
<div><br></div><div>"ahg", "awn"</div><div><br></div><div>/*{"gsw?/gsw-FR?",<span class="" style="white-space:pre"> </span>HB_TAG('A','L','S',' ')},*/<span class="" style="white-space:pre"> </span>/* Alsatian */</div>
<div><br></div><div>"gsw"</div><div><br></div><div>/*{"krc",<span class="" style="white-space:pre"> </span>HB_TAG('B','A','L',' ')},*/<span class="" style="white-space:pre"> </span>/* Balkar */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('B','C','R',' ')},*/<span class="" style="white-space:pre"> </span>/* Bible Cree */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"zh?",<span class="" style="white-space:pre"> </span>HB_TAG('C','H','N',' ')},*/<span class="" style="white-space:pre"> </span>/* Chinese (seen in Microsoft fonts) */</div>
<div><br></div><div>???</div><div><br></div><div>/*{"acf/gcf?",<span class="" style="white-space:pre"> </span>HB_TAG('F','A','N',' ')},*/<span class="" style="white-space:pre"> </span>/* French Antillean */</div>
<div><br></div><div>"acf", "gcf"</div><div><br></div><div>/*{"enf?/yrk?",<span class="" style="white-space:pre"> </span>HB_TAG('F','N','E',' ')},*/<span class="" style="white-space:pre"> </span>/* Forest Nenets */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"fuf?",<span class="" style="white-space:pre"> </span>HB_TAG('F','T','A',' ')},*/<span class="" style="white-space:pre"> </span>/* Futa */</div>
<div><br></div><div>"fuf"</div><div><br></div><div>/*{"ar-Syrc?",<span class="" style="white-space:pre"> </span>HB_TAG('G','A','R',' ')},*/<span class="" style="white-space:pre"> </span>/* Garshuni */</div>
<div><br></div><div>"ar-Syrc"</div><div><br></div><div>/*{"cfm/rnl?",<span class="" style="white-space:pre"> </span>HB_TAG('H','A','L',' ')},*/<span class="" style="white-space:pre"> </span>/* Halam */</div>
<div><br></div><div>"cfm"</div><div><br></div><div>/*{"fonipa",<span class="" style="white-space:pre"> </span>HB_TAG('I','P','P','H')},*/<span class="" style="white-space:pre"> </span>/* Phonetic transcription—IPA conventions */</div>
<div><br></div><div>"und-fonipa", or better map anything with a variant of "fonipa"</div><div><br></div><div>/*{"ga-Latg?/Latg?",<span class="" style="white-space:pre"> </span>HB_TAG('I','R','T',' ')},*/<span class="" style="white-space:pre"> </span>/* Irish Traditional */</div>
<div><br></div><div>"ga-Latg"</div><div><br></div><div>/*{"krc",<span class="" style="white-space:pre"> </span>HB_TAG('K','A','R',' ')},*/<span class="" style="white-space:pre"> </span>/* Karachay */</div>
<div><br></div><div>"krc"</div><div><br></div><div>/*{"alw?/ktb?",<span class="" style="white-space:pre"> </span>HB_TAG('K','E','B',' ')},*/<span class="" style="white-space:pre"> </span>/* Kebena */</div>
<div><br></div><div>"alw"</div><div><br></div><div>/*{"Geok",<span class="" style="white-space:pre"> </span>HB_TAG('K','G','E',' ')},*/<span class="" style="white-space:pre"> </span>/* Khutsuri Georgian */</div>
<div><br></div><div>"ka-Geok" (Georgian written with the Khutsuri script)</div><div><br></div><div>/*{"kca",<span class="" style="white-space:pre"> </span>HB_TAG('K','H','K',' ')},*/<span class="" style="white-space:pre"> </span>/* Khanty-Kazim */</div>
<div><br></div><div>"kca"</div><div><br></div><div>/*{"kca",<span class="" style="white-space:pre"> </span>HB_TAG('K','H','S',' ')},*/<span class="" style="white-space:pre"> </span>/* Khanty-Shurishkar */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"kca",<span class="" style="white-space:pre"> </span>HB_TAG('K','H','V',' ')},*/<span class="" style="white-space:pre"> </span>/* Khanty-Vakhi */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"guz?/kqs?/kss?",<span class="" style="white-space:pre"> </span>HB_TAG('K','I','S',' ')},*/<span class="" style="white-space:pre"> </span>/* Kisii */</div>
<div><br></div><div>"guz"</div><div><br></div><div>/*{"kfa/kfi?/kpb?/xua?/xuj?",<span class="" style="white-space:pre"> </span>HB_TAG('K','O','D',' ')},*/<span class="" style="white-space:pre"> </span>/* Kodagu */</div>
<div><br></div><div>"kfa"</div><div><br></div><div>/*{"okm?/oko?",<span class="" style="white-space:pre"> </span>HB_TAG('K','O','H',' ')},*/<span class="" style="white-space:pre"> </span>/* Korean Old Hangul */</div>
<div><br></div><div>"okm"</div><div><br></div><div>/*{"kon?/ktu?/...",<span class="" style="white-space:pre"> </span>HB_TAG('K','O','N',' ')},*/<span class="" style="white-space:pre"> </span>/* Kikongo */</div>
<div><br></div><div>"ktu"</div><div><br></div><div>/*{"kfx?",<span class="" style="white-space:pre"> </span>HB_TAG('K','U','L',' ')},*/<span class="" style="white-space:pre"> </span>/* Kulvi */</div>
<div><br></div><div>"kfx"</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('L','A','H',' ')},*/<span class="" style="white-space:pre"> </span>/* Lahuli */</div>
<div><br></div><div>"lbf", "lae", "bfu"</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('L','C','R',' ')},*/<span class="" style="white-space:pre"> </span>/* L-Cree */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('M','A','L',' ')},*/<span class="" style="white-space:pre"> </span>/* Malayalam Traditional */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"mnk?/mlq?/...",<span class="" style="white-space:pre"> </span>HB_TAG('M','L','N',' ')},*/<span class="" style="white-space:pre"> </span>/* Malinke */</div>
<div><br></div><div>"mlq"</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('N','C','R',' ')},*/<span class="" style="white-space:pre"> </span>/* N-Cree */</div>
<div><br></div><div>"csw"</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('N','H','C',' ')},*/<span class="" style="white-space:pre"> </span>/* Norway House Cree */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"jpa?/sam?",<span class="" style="white-space:pre"> </span>HB_TAG('P','A','A',' ')},*/<span class="" style="white-space:pre"> </span>/* Palestinian Aramaic */</div>
<div><br></div><div>"jpa", "sam"</div><div><br></div><div>/*{"polyton",<span class="" style="white-space:pre"> </span>HB_TAG('P','G','R',' ')},*/<span class="" style="white-space:pre"> </span>/* Polytonic Greek */</div>
<div><br></div><div>"el-polyton"</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('Q','I','N',' ')},*/<span class="" style="white-space:pre"> </span>/* Asho Chin */</div>
<div><br></div><div>"tbq"</div><div><br></div><div>(The spec says Chin not Asho Chin.)</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('R','C','R',' ')},*/<span class="" style="white-space:pre"> </span>/* R-Cree */</div>
<div><br></div><div>"atj"</div><div><br></div><div>/*{"chp?",<span class="" style="white-space:pre"> </span>HB_TAG('S','A','Y',' ')},*/<span class="" style="white-space:pre"> </span>/* Sayisi */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"xan?",<span class="" style="white-space:pre"> </span>HB_TAG('S','E','K',' ')},*/<span class="" style="white-space:pre"> </span>/* Sekota */</div>
<div><br></div><div>"xan"</div><div><br></div><div>/*{"ngo?",<span class="" style="white-space:pre"> </span>HB_TAG('S','X','T',' ')},*/<span class="" style="white-space:pre"> </span>/* Sutu */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('T','C','R',' ')},*/<span class="" style="white-space:pre"> </span>/* TH-Cree */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"tnz?/tog?/toi?",<span class="" style="white-space:pre"> </span>HB_TAG('T','N','G',' ')},*/<span class="" style="white-space:pre"> </span>/* Tonga */</div>
<div><br></div><div>"toi"</div><div><br></div><div>/*{"enh?/yrk?",<span class="" style="white-space:pre"> </span>HB_TAG('T','N','E',' ')},*/<span class="" style="white-space:pre"> </span>/* Tundra Nenets */</div>
<div><br></div><div>"yrk"</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('W','C','R',' ')},*/<span class="" style="white-space:pre"> </span>/* West-Cree */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"cre?",<span class="" style="white-space:pre"> </span>HB_TAG('Y','C','R',' ')},*/<span class="" style="white-space:pre"> </span>/* Y-Cree */</div>
<div><br></div><div>"crk"</div><div><br></div><div>/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('Y','I','C',' ')},*/<span class="" style="white-space:pre"> </span>/* Yi Classic */</div>
<div><br></div><div>Leave unmapped</div><div><br></div><div>/*{"ii?/Yiii?",<span class="" style="white-space:pre"> </span>HB_TAG('Y','I','M',' ')},*/<span class="" style="white-space:pre"> </span>/* Yi Modern */</div>
<div><br></div><div>"ii"</div><div><br></div><div>It would also be desirable to map otherwise unmapped languages in the</div><div>Yi script (ie with with a script code of Yiii) to YIM.</div><div><br></div><div>
/*{"??",<span class="" style="white-space:pre"> </span>HB_TAG('Z','H','P',' ')},*/<span class="" style="white-space:pre"> </span>/* Chinese Phonetic */</div>
<div><br></div><div>"zh-Latn"</div></div><div><br></div><div>I'll have some more general comments later.</div><div><br></div><div>James</div><div><br></div></div></div></div>