[Intel-gfx] X11 performance regressions

Mon May 9 23:43:41 CEST 2011

As a point of comparison, here are the similar results with master of all
the various trees on my 1.6GHz N450 (Atom+PineView) [so not strictly an
apples-to-apples comparison, your CPU is about 4-5x faster, but PNV is
about 3-4x faster than the 915GM (clock-for-clock)]:

On Sun, 08 May 2011 20:22:21 +0200, Knut Petersen <Knut_Petersen at t-online.de> wrote:
> 10000000 trep @   0.0032 msec (309000.0/sec): Dot
> 40000000 trep @   0.0006 msec (1650000.0/sec): Dot
50000000 trep @   0.0005 msec (1830000.0/sec): Dot
*100000000 trep @   0.0003 msec (2900000.0/sec): Dot
 
>   45000 trep @   0.5973 msec (  1670.0/sec): 500x500 rectangle
>  100000 trep @   0.4282 msec (  2340.0/sec): 500x500 rectangle
100000 trep @   0.3210 msec (  3120.0/sec): 500x500 rectangle

> 2000000 reps @   0.0034 msec (296000.0/sec): 1x1 stippled rectangle (8x8 stipple)
> 8000000 reps @   0.0007 msec (1420000.0/sec): 1x1 stippled rectangle (8x8 stipple)
25000000 trep @   0.0011 msec (902000.0/sec): 1x1 stippled rectangle (8x8 stipple)
*30000000 trep @   0.0008 msec (1180000.0/sec): 1x1 stippled rectangle (8x8 stipple)

>    1500 trep @  22.4602 msec (    44.5/sec): 500x500 stippled rectangle (8x8 stipple)
>    3000 trep @   9.2680 msec (   108.0/sec): 500x500 stippled rectangle (8x8 stipple)
4000 trep @   6.8986 msec (   145.0/sec): 500x500 stippled rectangle (8x8 stipple)
*3500 trep @   7.0786 msec (   141.0/sec): 500x500 stippled rectangle (8x8 stipple)
 
>  100000 trep @   0.4043 msec (  2470.0/sec): Fill 10x10 trapezoid
> 1000000 trep @   0.0336 msec ( 29700.0/sec): Fill 10x10 trapezoid
2000000 trep @   0.0152 msec ( 65700.0/sec): Fill 10x10 trapezoid
*4000000 trep @   0.0064 msec (156000.0/sec): Fill 10x10 trapezoid

Hmm. My suspicion was that this was GEM-related regressions (the overhead
of dynamic buffer manager and relocations) along with various
optimizations for the common cases affecting the software fallback
dominated benchmarks selected above. And whilst there may some element of
that behind the regression you're observing, I don't think that is the
whole story and Adam is right to suggest to check that the systems are
indeed configured identically (wrt to debug and optimisation options).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre