[Fontconfig] [PATCH] Cache most recent regcomp() call in _FcStrRegexCmp().
nick.alcock at oracle.com
Thu May 2 03:12:28 PDT 2013
On 1 May 2013, Behdad Esfahbod verbalised:
> On 13-05-01 05:29 PM, Nick Alcock wrote:
>> Under the circumstances, it is unfortunate that _FcStrRegexCmp() is called so
>> very often by fontconfig. On my system (with ~5000 fonts, and a stripped-down
>> fonts.conf), mapping a new Emacs frame makes twelve calls to FcFontMatch()...
>> and those twelve calls proceed to call _FcStrRegexCmp() 175171 times, with a
>> call to regcomp() every time! This then proceeds to invoke malloc() on the
>> order of three million times. It is fairly easy to turn this into a
>> pathological slowdown at runtime. My Emacsen are not small (arena sizes of a
>> gigabyte-plus are common, with a highly fragmented heap), and in that situation
>> those three million malloc() and free() calls can easily consume in excess of
>> fifty seconds (though on a fresh startup it takes 'only' a quarter of a second
>> or so).
> It's really strange that that function is called at all. Can you figure out
> why that is? Does your configuration have anything involving the "file" element?
My config is almost totally the same as upstream, but with only these
conf.d files listed:
20-unhint-small-vera.conf 25-unhint-nonlatin.conf 30-metric-aliases.conf
30-urw-aliases.conf 40-nonlatin.conf 45-latin.conf 49-sansserif.conf
50-user.conf 51-local.conf 57-dejavu.conf 58-dejavu-lgc.conf
60-latin.conf 61-dejavu-experimental.conf 65-fonts-persian.conf
65-nonlatin.conf 69-unifont.conf 70-yes-bitmaps.conf 80-delicious.conf
The call stack of all these regex calls is the same:
I'm sure some FC_DEBUG value can give useful output to say what value
list this is coming from, but the only one I've found that gives
anything (FC_DEBUG=2) just gives huge sprays of stuff that's useless for
this purpose :/
> Akira, regardless, I think we should remove the Regex and replace it with Glob
> matching that is already in fccfg.c.
> I know you want to extend regex to other elements, but for files I think globs
> are just fine.
Quite. Regexes are more powerful in general, but the prevalence of dots
in filenames suggests that they're the wrong tool here (certainly if you
use a filename as a regex and don't regex-escape the filenames first).
Even for other things, I think you need *some* sort of compiled-regex
cache (a small LRU cache, or something?) to try to prevent insane sprays
of regcomp() calls causing massive performance degradation. Perhaps a
simple static variable as here is not ideal, since it appears as a false
leak in valgrind, but... *something*. glibc regcomp() is much more
expensive now than it used to be before regex understood UTF-8, so the
old tradeoffs don't quite apply. (Sizing the cache might be interesting.
I suspect a one-item cache will only work in cases that shouldn't be
using regex at all, like this one. Perhaps tracking the percentage of
cache misses and increasing the cache size as long as misses continue,
up to some sanity bound? Your regexes should either be coming from the
font info, which is effectively fixed, or the config, which is
effectively fixed, so in the end all your regexes should be regexes
you've used before as long as the cache is big enough. Recompiling them
repeatedly is just a waste of time in that situation, unless the
regexec() is *very* infrequent.)
More information about the Fontconfig