[Fontconfig] MD5 checksum? Why? & mmap alignment on same machine in 32.v.64 mode

L. A. Walsh fonts at tlinx.org
Mon Jun 23 06:07:10 PDT 2014

Sorry if dup... haven't seen this on list...

Patrick Lam wrote:
> (Pedantic: fontconfig isn't object oriented.)
You mean it is not written in C++?

That doesn't mean it can't be object oriented.  I.e. would the font
config library be more likely to be "procedural" (describing a number of
specific tasks that one can do with the library and data), "functional" --
that is, described mostly in terms of what it can do, or "object-oriented"
-- mostly in terms of it's data structures and what operations one can
do on them?... 

Since we were talking about a library designed to manipulate a data format
(font structures) and provide various operations and functions on those
structures (like search, set-construction via various attribute 
and attribute compositions, etc)... it seemed most useful to talk of the 
in terms of its data members and the operations on them -- which I refer to
as a type of object-oriented system.

Is that an incorrect description of the library routines?

What would you use instead?

> For instance: 
> http://freedesktop.org/software/fontconfig/fontconfig-devel/fcdircacheload.html
Thank you for the URL. 

> As Akira pointed out, MD5 is only used for directory names.
I guess I don't see why it is used even for that.  It is **faster**, to do
plain compare than calc an MD5 sum for things  This is on *long files*
(You'd have to have alot of fonts to get 2TB ...;-)).

Example.  I used long files so the speed comparisons would be meaningful.
4 progs md5sum, cmp, diff and a perl prog that does a file compare
as part of it's work).  I flush cache between each (this is from disk):

-rwxrwxr-x 2 2.1G Jun 25  2013 foobar
-rwxrwxr-x 2 2.1G Jun 25  2013 test2
md5sum: 16.77sec 8.98usr 5.48sys (86.28% cpu)
  cmp: 14.41sec 4.08usr 6.07sys (70.43% cpu)
 diff: 7.79sec 0.19usr 5.87sys (77.75% cpu)
dedup: 8.99sec 2.19usr 5.33sys (83.70% cpu)

This is on a ramdisk:

-rwxrwxr-x 1 2.1G Jun 25  2013 foobar
-rwxrwxr-x 1 2.1G Jun 22 16:50 test2
md5sum: 8.93sec 8.01usr 0.92sys (100.02% cpu)
  cmp: 4.77sec 3.68usr 1.09sys (99.98% cpu)
 diff: 1.27sec 0.18usr 1.08sys (99.89% cpu)
dedup: 4.20sec 2.15usr 3.49sys (134.34% cpu)

Note that md5sum takes twice as long when the files
are in memory as any of the file compare progs.

Even from disk, it's still the slowest (the perl prog
uses MD5 hashes spaced at intervals through the file
to hedge against worst case, but in practice it more
often slows it down a bit than ever speeding it up.

>  Caches are versioned and there have been changes to the cache format. The
>  programmatic fontconfig API to access the cache is public and fixed. 

I don't see where it specifies that the URL format
on disk is fixed (or even public)...  I can understand that the API
may be fixed and public, but I don't see where the
library-internal cache format is also fixed.

I also didn't see why the x86 & x86_64 formats would be
incompatible or different or would have to be.

In fact, it seemed that data types were defined in terms of 
xxxxx{8,16,32} -- i.e.
some name with a numeric indicator of the bit-width.

This was what I suggested in my initial post as a way to unify the
caches on those architectures.  If they already used fixed
bit-width fields, could you help me understand what would cause
them to be different or why they couldn't be the same?  Looking
at the pages around that URL, I didn't see any reason why they
couldn't be the same cache.  I figure I must be missing it.
Do you have the URL that would explain why they would be
different (or a fixed format on disk?)

Thanks much!

More information about the Fontconfig mailing list