[Fontconfig] Matching font by unicode coverage

Behdad Esfahbod behdad at behdad.org
Mon Jan 19 17:35:20 PST 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15-01-19 05:21 PM, Sam Varshavchik wrote:
> Behdad Esfahbod writes:
> 
>> Thanks David,
>> 
>> Comments below.
>> 
>> On 15-01-19 12:36 PM, David Lattimore wrote:
>>> Matching with an FC_CHARSET works for me. No idea if what I'm doing
>>> is right, but roughly what I do is: - Identify script for each
>>> character (using http://unicode.org/Public/UNIDATA/Scripts.txt) -
>>> Split runs based on script (so that no run has two different
>>> scripts) - Build an FC_CHARSET for the run. - Match and use the
>>> resulting font.
>> 
>> While this works, it will be extremely slow, and results in
>> ransom-note effects whereas eg adding a diacritic mark to a run will
>> change the font for the entire run.
>> 
>> What most clients of Fontconfig do instead is to call FcFontSort() for
>> all desired properties but NOT FC_CHARSET, and then walk down the
>> returned list of font and pick the first font that supports each
>> desired character.
>> 
>> Does that make sense?
> 
> Does matching by FC_CHARSET really add that much overhead?

It does, but that's not the point.  The difference is that of only calling
FcFontSort once and using the results, or calling FcFontMatch for, in the
worst case, every character.

YMMV, don't come back saying that I didn't warn you :).

behdad


> What about the following approach:
> 
> 1. First, search for the font you want using the usual properties – font
> name, weight, etc…
> 
> 2. Once you have your font, get its FcCharset, then start looking up
> unicode characters in your text in the charset.
> 
> 3. When you find one that's not there, look up its unicode script
> property, collect all following, consecutive characters with the same
> property into an FcCharset, search for a replacement font, and continue
> with the unicode lookup using the replacement font. Some obvious edge
> conditions here, but that's the basic idea.
> 
> 4. You could reset, and go back to the original font, after the last one
> of the characters from the same script property that used the replacement
> font, and then try again with it, for the remaining characters.
> 
> I would think this would be optimized for a typical scenario – the
> original font will be used for most of the text, with the same
> replacement font for the exceptions.
> 
> 
> _______________________________________________ Fontconfig mailing list 
> Fontconfig at lists.freedesktop.org 
> http://lists.freedesktop.org/mailman/listinfo/fontconfig
> 

- -- 
behdad
http://behdad.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlS9sM8ACgkQn+4E5dNTERXf6ACguIU9sbKpiUXnE+D1hWaIR1kA
OgkAn28erHc7aQ6P77V50xZKLw25r4TK
=lFRs
-----END PGP SIGNATURE-----


More information about the Fontconfig mailing list