[Pixman] disable cache prefetch on ATOM can improve the gtkperf performance
M Joonas Pihlaja
jpihlaja at cc.helsinki.fi
Wed Jun 23 00:59:45 PDT 2010
On Wed, 9 Jun 2010, Soeren Sandmann wrote:
> Soeren Sandmann <sandmann at daimi.au.dk> writes:
>
> > prefetching. On an AMD Phenom with 512 KB of L2 and 512 KB of L3,
> > disabling prefetch was a tiny but consistent slow-down.
I ran the traces on an Intel Atom N450 with no L3, 512 KB of L2 and a
whopping 32 KB of L1 (a decent size for Intel compared to P4s) and got
the following perf-diff:
old: atom-with-noprefetch
new: atom-with-prefetch
Slowdowns
=========
image-rgba firefox-world-map-0 40705.97 (40781.80 0.11%) -> 42901.58 (42964.03 0.06%): 1.05x slowdown
image-rgba swfdec-giant-steps-0 7008.48 (7041.00 0.31%) -> 7392.33 (7414.32 0.25%): 1.05x slowdown
image-rgba poppler-0 7514.99 (7585.73 0.42%) -> 7928.72 (8025.66 0.40%): 1.06x slowdown
image-rgba xfce4-terminal-a1-0 9758.85 (9759.44 0.02%) -> 10312.88 (10320.35 0.03%): 1.06x slowdown
image-rgba firefox-woodtv-0 4197.62 (4198.42 0.07%) -> 4490.93 (4497.23 0.08%): 1.07x slowdown
image-rgba gnome-terminal-vim-0 13625.58 (13650.19 0.09%) -> 14588.68 (14600.53 0.04%): 1.07x slowdown
image-rgba ocitysmap-0 5137.56 (5146.48 0.29%) -> 5772.16 (5784.54 0.17%): 1.12x slowdown
Cooked and raw numbers are available here for the interested:
http://people.freedesktop.org/~joonas/tmp/atom/
With a lower threshold for cairo-perf-diff you'll see that nearly
across the board there's a small slowdown from using prefetching. On
the whole my experience with prefetching for graphics is that it's not
been very useful to do manually mostly since the streaming access
logic on most memory controllers seems to work pretty well. I've
never seen this big of a difference from actively using prefetching,
but then again this is the first time I've run tests on real app code
when looking at it.
In my experience a far bigger impact can be seen from the kind of
memory move instruction you use to access the data, what caches the
data turns out to be in, and how many streams you're trying to access
concurrently. For instance using a non-temporal move in the wrong
place can really wreak havoc with performance, so it's a bad idea to
use in generic code paths IMO.
Cheers,
Joonas
More information about the Pixman
mailing list