[Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

Wed Oct 14 17:06:39 PDT 2015

On Tue, Oct 13, 2015 at 08:50:17PM -0700, Ben Widawsky wrote:
> This patch series adds support for fast color clears on SKL as it exists on
> previous generations of hardware minus the new hardware restriction on surface
> formats. Additionally, it adds support for utilizing clear values with up to 32b
> per color channel (see note at the bottom). It is based on work originally done
> by Kristian, so thanks to him for that initial work as well as helping me debug
> some of the issues.
> 
> Additionally, thanks to Chad for helping track down the last bug in the rectangle
> scaling code which was (for me) being masked by another bug (#3 below). I
> imagine it would have been several more weeks at least before I uncovered it.
> 
> We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order to
> support the 32b per channel. As it turned out though, Skylake made other changes
> to support this which caused weird failures which seemed to interfere with
> each other.
> 
> 1. Not all surface formats support lossless compression.
> 2. Clearing multiple color buffer attachments must happen in n passes
> 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
> was the bug which Chad helped uncover, I had it correct in my patch from March
> http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
> had other problems which prevented merge, including #1 and #2 above).
> 
> I have no piglit, dEQP or CTS regressions (except for the last patch). I haven't
> yet, but will collect perf data on this ASAP. Historically we've come to expect
> this to provide large gains in tests which are memory bandwidth limited and
> doing many clears.

I left out the note here about 32b having two small regressions.

I did some very basic performance data collection. As expected, the rep_clears
which were already enabled by Chad seem to actually provide most of the gains. I
didn't actually run long enough to do much except prove to myself that there
aren't any performance regressions over the gen9 rep clears. These are the
results which shouldn't be taken too seriously (5 runs only).

Benchmark	    % diff (master->full 32b fast clears)
OglBatch0             1.87   
OglBatch1             0.54   
OglBatch2             -0.44  
OglBatch3             0.11   
OglBatch4             -0.94  
OglBatch5             -2.11  
OglBatch6             1.18   
OglBatch7             7.02   
OglDeferred           3.05   
OglDeferredAA         3.6    
OglFillPixel          0.07   
OglFillTexMulti       -0.01  
OglFillTexSingle      0.03   
OglGeomPoint          0.07   
OglGeomTriList        0.74   
OglGeomTriStrip       -0.13  
OglHdrBloom           -1.93  
OglMultithread        -0.96  
OglPSBump2            0.33   
OglPSBump8            0.31   
OglPSPhong            0.18   
OglPSPom              -0.08  
OglShMapPcf           0.03   
OglShMapVsm           -0.3   
OglTerrainFlyInst     0.46   
OglTerrainPanInst     0.4    
OglTexFilterAniso     -0.08  
OglTexFilterTri       0.13   
OglTexMem128          0.2    
OglTexMem512          -0.03  
OglVSDiffuse1         0.23   
OglVSDiffuse8         -0.23  
OglVSInstancing       -0.15  
OglVSTangent          -0.06  
OglZBuffer            0.07   
fill                  0.17   
filloff               -0.01  
fur                   -0.19  
heaven                0.56   
plot3d                -0.18  
trex                  4.51   
trexoff               3.69   
triangle              0.04   
valley                1.86   
warsow                0.18   
xonotic               0.4    

BTW: the patches are here as well (with 32b support reverted):
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=skl-fast-clear