[Fontconfig] Caching strategy improvements

Behdad Esfahbod behdad at behdad.org
Tue Feb 27 23:43:29 UTC 2018


On Tue, Feb 27, 2018 at 9:23 AM, Kurt Kartaltepe <kkartaltepe at gmail.com>
wrote:

> I'd be more than happy volunteer my time, as I think this is a
> worthwhile endeavor. However the fontconfig code base is quite
> sophisticated from my digging, so I'm not sure I would be able to make
> any changes without some serious study time. I'll probably be working
> on this in my spare time for my own usage and hopefully a PoC will
> make it way back to the list if I am successful.
>

I wouldn't say it's sophisticated per se, more that it's showing its age
(more below).
But yeah, hard to contribute without studying it seriously first.



> Any information/documentation that might ease deconstructing the
> fontconfig code base would of course be appreciated. Notably any
> documentation on the structures and terminology used throughout the
> code base and cache. Or any thoughts you may have had on this subject
> to guide the implementation.
>

Where do I start...

Initially, fontconfig stored cached patterns in text files and read them
back. In 2006, Patrick Lam, then a PhD student at MIT, worked under a grant
from Novell, to change fontconfig to use mmap'ed binary caches, to save on
load time & memory.  There was a significant hurdle however: some of the
structs were exposed in the public API. This made it hard to change them
structurally. Instead, a hack was used to store offsets in pointer struct
members by setting the lower bit of the pointer member to 1, knowing that
the structs those point to have an alignment > 1.  That complicates the
code base.

Another problem with caching subsystem is that, indeed, on some filesystems
(FAT), the timestamp is very coars (seconds). That introduces race
conditions that are hard to work around. So, locking was introduced. On
other filesystems (NFS), any filesystem syscall is very expensive. So, for
example, stat'ing every font is prohibitively expensive. These things all
constraint the designs that can be used.  For example, if stat was not
expensive (as in on a regular Linux system with local filesystem), we could
just ftw() all font dirs and for each font file check in an on-disk hashmap
if it's already in the cache and add it if it's not.  But we can't do
that...

More recently, work was done to be able to reuse the cache even when a font
directory is mounted at a different position in the filesystem, to support
flatpak. This, again, given that we had to support existing API,
complicated the codebase even more.

Back to what can be done....  If we could break API or if it wasn't exposed
so much, I would have suggested building on top of GVariant [0] or Protocol
Buffers [1]. The cache then will simply be a GVariant hash on the disk,
same way that dconf [1] works. But we cannot plain use that, so next thing
is to study it and reimplement to accommodate existing API.

I'll try to write more later.


[0] https://macsphere.mcmaster.ca/handle/11375/21321
[1] https://github.com/google/protobuf
[2] https://wiki.gnome.org/action/show/Projects/dconf



> --Kurt Kartaltepe
>
> On Mon, Feb 26, 2018 at 8:57 PM, Behdad Esfahbod <behdad at behdad.org>
> wrote:
> > Thanks for clarification. No worries.
> >
> > Rewriting the cache is an interesting challenge, but so far we don't have
> > any volunteers.
> >
> > On Mon, Feb 26, 2018 at 6:28 PM, Kurt Kartaltepe <kkartaltepe at gmail.com>
> > wrote:
> >>
> >> I have rebuilt 2.12.93 tonight and it appears I was mistaken. I had
> >> attempted to replace 2.12.6 in my build chain but that must have been
> >> reverted as 2.12.93 indeed provides ~100x improvement and builds the
> >> cache on my system in 600ms.
> >>
> >> I still hold this is not a "blanket you should improve it" post.
> >> Indeed a hashmap (or any mapping between patterns and files that
> >> allows rapid validation of non-dirty files) on disk that reuses the
> >> patterns for files that didnt change is indeed what I suggesting from
> >> the start. I don't see why this needs to be lock-free as the entire
> >> structure can be atomically updated using the same mechanisms already
> >> in use for the cache. I defer to your experience if this cache is
> >> contended enough to warrant such a structure.
> >>
> >> I understand this is would be a significant project which is why i
> >> brought it to the mailing list and now that font cache build times are
> >> in seconds for large font libraries it is indeed harder to justify.
> >> Thank you very much for your time and sorry this all started due to a
> >> mistake on my own part. (I hope I have not been using terms
> >> inappropriately. but I have been using font/file interchangeably and
> >> from your replies it appears this may have been a mistake).
> >>
> >> --Kurt Kartaltepe
> >>
> >> On Mon, Feb 26, 2018 at 5:26 PM, Behdad Esfahbod <behdad at behdad.org>
> >> wrote:
> >> > On my laptop, warm fc-cache -f of over 2000 fonts takes 3.5s. So maybe
> >> > worth
> >> > checking what's taking so much time on your setup.
> >> >
> >> > Believe me, if we knew how to make it faster easily, we would have
> done.
> >> > So,
> >> > any blanket "you should improve it" has no information content
> >> > whatsoever.
> >> >
> >> > Caching per font is not realistic.
> >> >
> >> > The best I can think of, requires a complete rewrite of the caching,
> and
> >> > would use a single cache file that implements a lock-free hashmap on
> the
> >> > disk and reuses pattern for files that didn't change. But that's a
> very
> >> > significant project to undertake.
> >> >
> >> > On Mon, Feb 26, 2018 at 3:38 AM, Kurt Kartaltepe <
> kkartaltepe at gmail.com>
> >> > wrote:
> >> >>
> >> >> Sorry It appears I have not been replying to the list.
> >> >>
> >> >> I would like to add testing on 2.12.6 before the much improved
> >> >> performance changes was ~40s cache build times with significant disk
> >> >> I/O. So the newly improved scanning is much appreciated but doesn't
> >> >> solve all the issues with cache build times.
> >> >>
> >> >> On Mon, Feb 26, 2018 at 5:29 AM, Kurt Kartaltepe
> >> >> <kkartaltepe at gmail.com>
> >> >> wrote:
> >> >> > 2.12.93 as released on
> >> >> > https://www.freedesktop.org/software/fontconfig/release/
> >> >> >
> >> >> > On Mon, Feb 26, 2018 at 5:27 AM, Behdad Esfahbod <
> behdad at behdad.org>
> >> >> > wrote:
> >> >> >> Just to make sure we are on the same page, which fontconfig
> version
> >> >> >> are
> >> >> >> you
> >> >> >> testing with?
> >> >> >>
> >> >> >> On Mon, Feb 26, 2018 at 3:21 AM, Kurt Kartaltepe
> >> >> >> <kkartaltepe at gmail.com>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> For clarification, I have tested with ONLY the ttf fonts on my
> >> >> >>> system.
> >> >> >>> In this case the normal 18s cache build step takes 15s. This
> >> >> >>> suggests
> >> >> >>> to me there is no significant difference between FON and TTF, as
> >> >> >>> they
> >> >> >>> made up ~21% of my fonts and removing them resulted in a
> >> >> >>> proportionate
> >> >> >>> savings. Sorry If my OP was misleadingly suggesting that FON
> files
> >> >> >>> were exceptionally slow, I only meant that they may not have
> >> >> >>> received
> >> >> >>> the same improvement as TTF files which may just be my
> >> >> >>> misunderstanding of the changes you made and lack of testing.
> >> >> >>>
> >> >> >>> I am concerned with why it seems acceptable to rebuild the entire
> >> >> >>> cache when only a tiny portion of it has actually changed. Users
> >> >> >>> for
> >> >> >>> which rebuilding the cache is a significant event are those with
> >> >> >>> large
> >> >> >>> font libraries. These users are are by their very nature more
> >> >> >>> likely
> >> >> >>> to add or remove fonts from their library. It seems that this is
> >> >> >>> the
> >> >> >>> worst possible case for the current caching strategy, and *this*
> >> >> >>> seems
> >> >> >>> like an issue worth fixing.
> >> >> >>>
> >> >> >>> In this case if checksuming files is slower than scanning them
> the
> >> >> >>> issue still stands. Why checksum files that haven't changed? Does
> >> >> >>> fontconfig not trust filesystem metadata? It would appear
> directory
> >> >> >>> change times are used in detecting when to rescan so why can this
> >> >> >>> not
> >> >> >>> be extended to files instead of the expensive checksum?
> >> >> >>>
> >> >> >>> FWIW an md5sum of my entire font library takes ~1s with hot
> caches
> >> >> >>> which I still find unacceptable as my library is possibly
> >> >> >>> significantly smaller and my system significantly more powerful
> >> >> >>> than a
> >> >> >>> potential user's.
> >> >> >>>
> >> >> >>> --Kurt Kartaltepe
> >> >> >>>
> >> >> >>> On Sun, Feb 25, 2018 at 9:10 PM, Behdad Esfahbod
> >> >> >>> <behdad at behdad.org>
> >> >> >>> wrote:
> >> >> >>> > What's with fon files being slow? Please report *that* and
> let's
> >> >> >>> > fix
> >> >> >>> > it.
> >> >> >>> >
> >> >> >>> > We've made scanning, like, 100x faster already. 2007 stats are
> >> >> >>> > irrelevant.
> >> >> >>> > Checksuming files is slower than scanning them now.
> >> >> >>> >
> >> >> >>> > On Sun, Feb 25, 2018 at 8:08 AM, Kurt Kartaltepe
> >> >> >>> > <kkartaltepe at gmail.com>
> >> >> >>> > wrote:
> >> >> >>> >>
> >> >> >>> >> While trying to move a project to the pango stack I noticed
> the
> >> >> >>> >> native
> >> >> >>> >> font selection backends were bad/useless on some platforms
> (like
> >> >> >>> >> windows see [1]). So I opted to try and use fontconfig on all
> >> >> >>> >> platforms as it performs outstandingly and has wonderful
> >> >> >>> >> defaults
> >> >> >>> >> for
> >> >> >>> >> all platforms.
> >> >> >>> >>
> >> >> >>> >> However during this transition I noticed that there are some
> >> >> >>> >> major
> >> >> >>> >> issues with cache build speed and during investigation I see
> >> >> >>> >> that
> >> >> >>> >> there has recently been effort to improve the situation[2].
> From
> >> >> >>> >> what
> >> >> >>> >> I can tell the fontconfig team has maintained that these cache
> >> >> >>> >> issues
> >> >> >>> >> were irrelevent for the primary fontconfig platform (linux)
> [3].
> >> >> >>> >> On
> >> >> >>> >> linux of course the cache is global and maintained usually by
> >> >> >>> >> font
> >> >> >>> >> packages ensuring its up-to-date. However it was precisely
> this
> >> >> >>> >> the
> >> >> >>> >> slow cache build times that lead to package managers being
> >> >> >>> >> required
> >> >> >>> >> to
> >> >> >>> >> build in additional tooling to support not rebuilding cache
> for
> >> >> >>> >> every
> >> >> >>> >> font installed [4].
> >> >> >>> >>
> >> >> >>> >> Anyway I hope that is enough reason to persuade you that there
> >> >> >>> >> are
> >> >> >>> >> substantial improvements to make to the caching strategy and
> >> >> >>> >> they
> >> >> >>> >> are
> >> >> >>> >> beneficial not only for the odd platforms (osx, windows) but
> >> >> >>> >> also
> >> >> >>> >> for
> >> >> >>> >> Linux.
> >> >> >>> >>
> >> >> >>> >> My question is if fontconfig would be receptive to
> >> >> >>> >> building/accepting
> >> >> >>> >> a patch modifying the caching strategy to include checkums per
> >> >> >>> >> file
> >> >> >>> >> instead of/in addition to per directory. Currently any change
> to
> >> >> >>> >> directory (such as adding a new font) invalidates all fonts
> >> >> >>> >> within
> >> >> >>> >> that directory. This means for directories like the system
> >> >> >>> >> directory
> >> >> >>> >> it results in re scans of hundreds or more fonts. Thankfully
> >> >> >>> >> this
> >> >> >>> >> is
> >> >> >>> >> faster on platforms like linux where all fonts on freetype.
> >> >> >>> >> However
> >> >> >>> >> this improvement in scanning did not carry over to windows
> with
> >> >> >>> >> its
> >> >> >>> >> many FNT (150 on the average install) and even on my very
> robust
> >> >> >>> >> development machine building a cache for a mere 650 files
> takes
> >> >> >>> >> half a
> >> >> >>> >> minute. This might be acceptable on install of the application
> >> >> >>> >> where
> >> >> >>> >> we can take our time building the cache, but what happens
> when a
> >> >> >>> >> user
> >> >> >>> >> installs 1 more font? A change to cache individual file
> >> >> >>> >> checksums
> >> >> >>> >> would provide fontconfig a way to only require the expensive
> >> >> >>> >> coverage
> >> >> >>> >> check of a single font instead of the entirety of a users. I
> >> >> >>> >> dare
> >> >> >>> >> say
> >> >> >>> >> with this exact change the need to use a faster less robust
> >> >> >>> >> coverage
> >> >> >>> >> check that made scanning freetype fonts faster may be unneeded
> >> >> >>> >> as
> >> >> >>> >> the
> >> >> >>> >> number of scans required to rebuild a cache would reduced 100x
> >> >> >>> >> on
> >> >> >>> >> the
> >> >> >>> >> average system or more.
> >> >> >>> >>
> >> >> >>> >> I'm certain such a change would be highly appreciated by all
> >> >> >>> >> fontconfig consumers who are hoping to use its powerful
> feature
> >> >> >>> >> set
> >> >> >>> >> in
> >> >> >>> >> a multiplatform context.
> >> >> >>> >>
> >> >> >>> >> --Kurt Kartaltepe
> >> >> >>> >>
> >> >> >>> >> [1] https://bugzilla.gnome.org/show_bug.cgi?id=162681
> >> >> >>> >> [2]
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> https://lists.freedesktop.org/archives/fontconfig/2017-Augus
> t/005986.html
> >> >> >>> >> [3] https://bugs.freedesktop.org/show_bug.cgi?id=64766
> >> >> >>> >> [4]
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> https://lists.freedesktop.org/archives/fontconfig/2007-Octob
> er/002728.html
> >> >> >>> >> _______________________________________________
> >> >> >>> >> Fontconfig mailing list
> >> >> >>> >> Fontconfig at lists.freedesktop.org
> >> >> >>> >> https://lists.freedesktop.org/mailman/listinfo/fontconfig
> >> >> >>> >
> >> >> >>> >
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > --
> >> >> >>> > behdad
> >> >> >>> > http://behdad.org/
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> behdad
> >> >> >> http://behdad.org/
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > behdad
> >> > http://behdad.org/
> >
> >
> >
> >
> > --
> > behdad
> > http://behdad.org/
>



-- 
behdad
http://behdad.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/fontconfig/attachments/20180227/955b7604/attachment-0001.html>


More information about the Fontconfig mailing list