[Fontconfig] Supporting Unicode variation selectors

Wed Jun 17 20:20:15 PDT 2015

On 15-06-17 08:15 PM, suzuki toshiya wrote:
> Hi,
> 
> I guess current discussion is focused to the variation
> selectors itself, and, you have no attempt to make
> fontconfig to handle IVS-specific info. For example,
> a question "this font supports the IVSs defined for
> Adobe-Japan1 /or not" might be the future task and
> separated from the current discussion.
> Am I understanding correctly?

Well, I didn't think of that.  But if there's a BCP 47 tag for that, we can
add an orth file for it, sure.

> Regards,
> mpsuzuki
> 
> Behdad Esfahbod wrote:
>> Hi everyone,
>>
>> Currently fontconfig does not support Unicode variation selectors.  Lets fix that.
>>
>> Unicode defines 256 generic variation selectors, in the following ranges:
>>
>> U+FE00 VARIATION SELECTOR-1..U+FE0F VARIATION SELECTOR-16
>> U+E0100 VARIATION SELECTOR-17..U+E01EF VARIATION SELECTOR-256
>>
>> OpenType encodes those in cmap subtable format 14.  Fonts as such can encode
>> pairs of characters in the cmap instead of one.  The second of the pair is
>> supposed to be a variation selector, though nothing in the table format
>> enforces that.
>>
>> To support these in fontconfig, two changes are needed:
>>
>>   - Extend FcCharSet to be able to carry variation sequences as well,
>>
>>   - Add a FcFreeTypeCharIndex variant that takes a variation selector. (ala
>> FT_Face_GetCharVariantIndex),
>>
>> Adding the latter is rather trivial.  For the former, it would be easiest if
>> we encode the variation selector and the base Unicode character in one 32-bit
>> integer, and make sure FcCharSet handles that efficiently (this probably is
>> currently not the case).
>>
>> Ideally, we'd want to encode the sequence U followed by VSx (where VSx is the
>> VARIATION SELECTOR-x) as (U + VSx << 24).  This will use the high byte of the
>> 32bit unsigned for the variation selector number.  The only problem is: there
>> are 256, not 255, variation selectors.  I submitted a proposal to Unicode to
>> commit to not use the last one, but that was not accepted.  Currently up to
>> ~240 are used.
>>
>> Failing that most beautiful scheme, we can use a different shift.  21 would be
>> the next most natural, given that Unicode numbers fit 21 bits.  It just would
>> be much harder to read a hex of a 32bit number in the FcCharSet verbose output
>> and know what it means.
>>
>> So, what do people think?  Lets make this happen.
>>
>> Thanks,
> 

-- 
behdad
http://behdad.org/