[Pixman] [cairo] pixman: New ARM NEON optimizations

Jonathan Morton jonathan.morton at movial.com
Tue Feb 16 13:36:53 PST 2010

> The biggest surprise here is the pathologically bad performance of 'memset'
> function in 'image' backend tests, especially for 'evolution' benchmark. My
> only guess is that glibc could have probably messed up with the caches somehow
> (maybe by improperly using nontemporal memory writes or something).

A quick check of glibc sources suggests that MOVNTIQ (the non-temporal
64-bit write) can indeed be used under at least some circumstances.
It's not immediately clear *which* circumstances, since there's a lot
of assembler in that file and I'm not used to x86 assembly.

It's all very well to avoid "cache pollution", but with a
general-purpose function like memset() it's not at all clear that
keeping a freshly zeroed buffer out of cache is a good idea.  I
actually have a project on a different subject where attempting to
ensure the opposite is desirable.  Especially with today's enormous L3
caches.  Somebody please take a cluestick to the GNU folks.

The plain-C implementation of memset looks sane though.

 - Jonathan

More information about the Pixman mailing list