[Fontconfig] MD5 checksum? Why? & mmap alignment on same machine in 32.v.64 mode

L. A. Walsh fonts at tlinx.org
Tue Jun 24 07:28:12 PDT 2014


Raimund Steger wrote:
>
>  I think you'd have a hard time even measuring the impact of MD5 in
>  fontconfig.  I just tried with my profiler and it only shows up as 0,
>  where the whole caching process is usually well in the seconds.
---
Caching 32+64 bit on my machine measure in 540-860 x 2 ~= 1400 seconds,
or easily over 100 times longer.   @ ~6000-9000 fonts, doing it once is
painful.  Doing it twice seems a bit masochistic or sadistic depending on
what end you're on... ;-/  If I get caught doing it on windows, it takes
about 25-30% more (cygwin on windows, not a recent timing, as I uninstalled
the problematic SW). 

I am trying to force keeping 32-bit GUI software off of my machine 
because of
this, and it's till unacceptable, I try not to use cygwin's native-X 
clients because
of it.   My machines are a few years old (3.4GHzx6 on Win, 2.8GHzx12 on 
Linux).

X11 doesn't take long, and neither does windows (nor does it need two 
separate
versions of it's font dir on a bi-arch machine)  Granted, the fonts on 
linux have better
coverage, but not by 100x (my linux and x11 GUI's mostly use the same font
directories (or copies of them).  I mostly try to keep to a minimum of 
arch-specific
fonts (like the 75/100dpi fonts.. stick to TTF/OTF/TTC fonts on both, 
but that's  6000-9000
figure comes from my active font list on windows.  Under X, I prune out 
the alt-lang/charset
aliases for the fonts, usually only accepting utf-8 compat(10646-1) and 
iso-8559-1/15 for compat
for non unicode font use.

Simply including the code is wasteful and less likely to keep code in 
the memory cache when
executing..


>  it's fed are a few bytes of pathnames, that's not surprising. To create
>  a bunch of not-too-long filenames from pathnames, that have a good chance
>  of not hitting an existing one, I imagine that many other systems (e. g.,
>  HTTP caches...) use something very similar. 
---
Not even close: squid's cache system levels are configurable as well as 
having a few
plugin-cache systems, but on my system:

2 levels of dirs numbered in hex:

    /var/cache/squid/{00..3F}/{00..3F}/

64 dirs in each giving level  4096 cache-dirs that only take 4096 
bytes/dir (only 1 read/dir)
using an 8-digit hex string giving a 32-bit filename space + 12-bits for 
the dir(s) about
a 16Tera-name file space (it's currently using about 445K files taking a 
total of 87G of space).


(with 444093 files spread over the 4096 directories -- not optimal for 
font usage, but
good for squid's random access needs.

Compared to my linux fonts dir of 7.9G with two largest dirs being:

2.0G    /usr/share/fonts/OTF
5.2G    /usr/share/fonts/TTF
----    -----
7.9G    TOTAL

squid starts up in a few seconds normally, but has radically different 
needs than
the font lib, so it's startup time isn't comparable, BUT the point was it
does manage ALOT of unique filenames w/o a need for anything complicated.




>
>  About the 32 vs. 64 bit issue, and leaving API considerations aside,
>  doesn't fontconfig's serialization format use intptr_t sized offsets? If
>  yes, I think it's not smart to cast these to non-native sizes.


I'd agree and on further examination, I see it wouldn't be
an identical format, but would be either adaptable in 64-bit mode
(not ideal), or easily convertible.
Either way, if the data is limited to <4G (which seems uncertain or
unlikely), given I have something like 8G of fonts NOW, and in 5-10
years, that size DB MIGHT be considered small.   If 32-bit needs
>  4G data sizes, that's already a problem (not that the in memory
sizes needs to reflect exactly what is on disk, but if it does and
what is on disk is > 4G, some measures could conceivably get
tight during the lifetime of this format.



I thought the first and primary data object was representative of the 
included
data.  It shows the way (fr. 
http://freedesktop.org/software/fontconfig/fontconfig-devel/x31.html)


An FcValue object holds a single value with one of a number of different 
types. The 'type' tag indicates which member is valid.

        typedef struct _FcValue {
                FcType type;
                union {
                        const FcChar8 *s;
                        int i;
                        FcBool b;
                        double d;
                        const FcMatrix *m;
                        const FcCharSet *c;
            void *f;
            const FcLangSet *l;
                } u;
        } FcValue;


The union is the key -- since it's the size of the longest value, it's 
8-bytes
on both 32 and 64 bit archs.

If the pointers in the other structs were in unions w/a double,
(or an 8-byte character string), those would all be compatible as well.

But getting both archs for the price of 1 is only a small part of the 
problem
with a ~10+ minute rebuild time.


   


More information about the Fontconfig mailing list