[Fontconfig] fc-2_4_branch: mmaping data structures

Patrick Lam plam at MIT.EDU
Fri Jul 8 22:32:27 EST 2005


I've created the fc-2_4_branch, which enhances fontconfig with mmapable
cache files.  The initial commit to the branch modified fontconfig data
structures to be offset-based rather than being index-based and
introduced FcCacheSerialize and FcCachePrepareSerialize calls;
FcCachePrepareSerialize counts how many of various data structures exist
in the program, while FcCacheSerialize converts dynamic data structures
into static data structures.  In this version of fontconfig, most
pointers in memory are replaced by fat data structures (essentially, s
/*/Ptr) which can contain either indices or pointers (depending on the
FcStorage structure member).

I've just committed a second change, which converts FcObjectPtr
(strings) from fat data structures into simple IDs.  The ID is a pointer
into the objectptr_indices array, which maps IDs into indices. An index
is coded as follows: positive integers denote offsets into the static
objectcontent_static_buf buffer, which contains a sorted list of
strings; negative integers denote indices into the objectcontent_dynamic
array of pointers to strings.  Perhaps the lookup function will make the
encoding clear:

const char *
FcObjectPtrU (FcObjectPtr si)
{
    if (objectptr_indices[si] > 0)
       return &objectcontent_static_buf[objectptr_indices[si]];
    else
       return objectcontent_dynamic[-objectptr_indices[si]];
}

Since static strings are sorted, they can be quickly compared:

int
FcObjectPtrCompare (const FcObjectPtr a, const FcObjectPtr b)
{
    /* count on sortedness for fast objectptrs. */
    if ((a == b) || (objectptr_indices[a] > 0 && objectptr_indices[b] > 0))
       return objectptr_indices[a] - objectptr_indices[b];

    // (... conversion, see below ...)
    return strcmp (FcObjectPtrU(a), FcObjectPtrU(b));
}

Of course, performance might suffer if too many strings are dynamic.
When we mmap the strings into memory, then strings will of course be
static.  But maybe we have a lot of dynamic strings, for instance if we
can't successfully load the cache file.  In that case, I've written a
function which converts all dynamic strings into static strings every
100000 dynamic compares.  Note that this can be a big lose in terms of
memory consumption if the number of dynamic strings is small compared to
the number of mmapped static strings, because we'd move from having an
mmapped buffer to having all strings locally allocated. It might not
even be necessary to convert dynamic strings to static strings if most
of the compares happen between strings coming from fontconfig (some are
filenames, for instance, or font names).  Profiling is needed here.

The dynamic-to-static conversion function is also useful when
serializing the strings, of course.  When we serialize the strings, we
can renumber the strings at the same time, since we're guaranteed to
enumerate all strings that are reachable from the root and hence that
get loaded in later.  (This is mark-and-sweep garbage collection for
strings.)  FcObjectPtrSerialize converts an old string id into a new
string id.

My next step is to implement the functions which read and write the
cache files.  This will involve hacking the fc-cache executable.  The
next commit will write a monolithic cache file which contains all active
fonts (which is not the end of the story, but a good intermediate step.)

Please test out the branch and let me know what experiences you have
with it!

pat



More information about the Fontconfig mailing list