[Fontconfig] Can we use base 16, and not 85, for ASCII charset representations?
W. Trevor King
wking at tremily.us
Sat Sep 21 17:26:13 PDT 2013
I worked up the last two patches [1,2] on the road toward
understanding fontconfig's view of charsets, with the goal being:
Which installed fonts contain code point 0xXXXX?
Now I understand the (base-code-point, bitmap) structure (as
documented in [2]), and I can use this:
$ fc-list -v 'URW Chancery L:style=Medium Italic'
…
charset:
0000: 00000000 ffffffff ffffffff 7fffffff 00000000 ffffffff ffffffff ffffffff
0001: ffffffff ffffffff fffff3ff ffffffff 00040000 00000000 00000000 00000000
0002: 03000000 00000000 00000000 00000000 00000000 00000000 3f0002c0 00000000
0003: 00000000 00000000 00000000 00000000 00100000 10000000 00000000 00000000
0004: ffffffff ffffffff ffffffff 00000000 00000000 0c00c000 faff0007 033ffffc
0020: 77180000 06010047 00000010 00000000 00000000 00001000 00000000 00000000
0021: 00400000 00000004 00000000 00000000 00000000 00000000 00000000 00000000
0022: 46260044 00000000 00000000 00000031 00000000 00000000 00000000 00000000
0025: 00000000 00000000 00000000 00000000 00000000 00000000 00000400 00000000
00f6: 00000000 00000000 00000000 00000000 00000000 00000000 000001f8 00000000
00fb: 00000006 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(s)
However, I'm still stuck on the base-85 formatting for the user-facing
charsets (and I'm not alone: [3]):
$ fc-list 'URW Chancery L:style=Medium Italic' charset
:charset= |>^1!|>^1!P0oWQ |>^1!|>^1!|>^1!!!!%#|>^1!|>^1!|>]fs|>^1!!!K?& !!!)$!{{B% 9;*l$ !!!.% !#f05(1+e5 !!!1&|>^1!|>^1!|>^1! %rw)IzbyU$#%lqi!!#0GM>RAd#y#fx!!!!5 !!!W5 !!#3H!)pSj!!!!& !!#6I<UG/) !!!!X !!#AL !!!1& !!+fv !!!(y !!+u{!!!!)
Is code point 0x2202 in the first? Yes:
* 0x2202 / 0xff = 0x22, so it's in the "0022:" row, with a remainder
of 0x2202 & 0xff = 0x02
* 0x02 / 32 = 0, so it's in the first block (map[0] = 0x46260044),
with a remainder of 0x02 % 32 = 2
* 2 / 0xf = 0, so it's in the least significant digit of the block
(map[0] & 0xf = 4), with a remainder of 2 % 0xf = 2
* The remainder-2 entry is the third bit (2+1) in the digit, because
the remainder-0 entry gets the first bit. The third bit is in the
4s column, and that's set in the digit 4 ;).
To do the same with the second format, I had to fiddle with the
valueToChar and Python to determine that 0x2200 is 0:0:0x1:0x11:0x22
in base 85, which should be represented by '!!#6I'. The next five
characters are '<UG/)', which decodes to 0x16:0x2e:0x20:0xa:0x6 in
base 85, which is indeed 0x46260044.
I don't think saving three characters (37.5%) is worth the hassle of
learning a fontconfig-specific set of digits for base 85. If I
convert the parse/unparse code in fccharset.c to use hex, would that
be mergable? The only problem I can see would be for folks scripting
fc-list that had already written parsers for the current format (a
null set?).
Alternatively, perhaps there is another way to lookup fonts containing
a character, and I've just missed it. In that case I don't care how
ugly the charset serialization is :p.
Cheers,
Trevor
[1]: http://thread.gmane.org/gmane.comp.fonts.fontconfig/4914
[2]: http://thread.gmane.org/gmane.comp.fonts.fontconfig/4915
[3]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498039#5
--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/fontconfig/attachments/20130921/3e39ae44/attachment.pgp>
More information about the Fontconfig
mailing list