[cairo] Overhead reduction

Jonathan Morton jonathan.morton at movial.com
Wed May 20 03:58:45 PDT 2009


> > And also, of course, how the cairo developers in general feel about
> > moving the thread synchronization primitives to pixman.
> 
> As it happens, I've written a custom framework to do *just* the
> object-pool stuff portably.  It falls back to a trivial implementation
> (just calling malloc and free) if it doesn't have the right atomic
> operations to use, but aims to be fast in the common case where there is
> little or no contention.

> > > Perhaps using a fixed allocation on the stack would be appropriate for
> > > the common case(s) which use very low numbers, and falling back to
> > > malloc for the general case.
> > 
> > Right, this is how we'd usually do it.
> 
> I've already implemented this; patch will come later.

Attached are a pair of independent patches which implement the above two
optimisations.

The region32 patch is platform-independent and fairly trivial.

The image patch, which implements an object pool, is probably more
controversial and certainly a lot bigger.  However, under my tests it is
very very nearly as fast as the previous unprotected stack, while being
threadsafe (unless there are bugs I haven't spotted).  The object pool
should be reusable for other fixed-size objects without change.

The reason it's fast is that I don't take a full mutex.  Instead, I
perform atomic operations (very approximately a compare-and-swap) on
elements of a void*[], and keep an unprotected index as a hint for where
to look first.  In the "typical" case of low-to-zero contention, there
is only one of these atomic operations per alloc or dealloc, provided
the pool has an object or a space ready.

However, I've only implemented the atomic primitives for ARMv6 (and
later).  It should be reasonably obvious what to do for, say, x86 and
PowerPC, and I've included pseudocode for both preferred styles in the
comments.  Since I've done this without looking at the Cairo version, it
should be licensing-clean.

(Because I'm nice, I might implement IA32, AMD64, PPC32 and PPC64
versions in my spare time.  Maybe even pre-v6 ARM.)

If the atomic primitives are not available for the platform, it should
revert to a trivial implementation, which just calls malloc() and free()
with no pooling.  This should be the same speed and functionality as the
status quo.

A destructor is provided, but not called from anywhere.  If there's a
codepath to shut down pixman, the destructor should be called from it to
make leak-checkers happy.  Otherwise, it's not really needed.

-- 
------
From: Jonathan Morton
      jonathan.morton at movial.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mallocectomy-image.patch
Type: text/x-patch
Size: 8348 bytes
Desc: not available
Url : http://lists.cairographics.org/archives/cairo/attachments/20090520/44e7c26f/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mallocectomy-region.patch
Type: text/x-patch
Size: 1371 bytes
Desc: not available
Url : http://lists.cairographics.org/archives/cairo/attachments/20090520/44e7c26f/attachment-0001.bin 


More information about the cairo mailing list