[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Soeren Sandmann sandmann at daimi.au.dk
Thu Sep 2 15:39:54 PDT 2010

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> Looks like this data has been posted already:
> http://lists.freedesktop.org/archives/pixman/2010-June/000231.html
> Checking a few more things with microbenchmarks shows that the prefetch 
> distance of just 64 bytes ahead is way too small. 

We should probably get these microbenchmarks into the tree.

Microbenchmarks can be a little dangerous since they sometimes amplify
irrelevant details and make them seem important, but they are clearly
useful in many cases.

> It has to be increased up to something like 256-320 to get good
> memory performance. Apparently software prefetch also disables or
> interferes with the hardware prefetcher on Intel Atom, hurting
> performance a lot. More advanced processors can cope with it.
> But increased prefetch distance is less effective (or can even decrease 
> performance) when dealing with small images, so it is not always good.
> Are there any SSE2 capable x86 processors without hardware prefetch capability? 
> Maybe it's really a good idea to remove software prefetch from SSE2 fast path 
> code?

Yeah, it seems so. All data so far suggests that software prefetching
is somewhere between 'slight slowdown' and 'no real effect'. The SSE2
fast paths have very predictable memory access patterns, so a hardware
prefetcher should be able to do a good job.

(It might be worth investigating whether software prefetch would be
beneficial in the tiled rotation fast paths, since the access patterns
there could be much harder to predict for the hardware).


More information about the Pixman mailing list