[Mesa-dev] V3 On disk shader cache for i965 (Now with real world results!)

Timothy Arceri timothy.arceri at collabora.com
Sun Jun 26 04:16:32 UTC 2016


I've spent a bunch of time rebasing this series to remove the excess
code churn and I've just pushed the results to the shader-cache branch
mentioned below. There are no code changes to the end result but I've
managed to get the patch count down to 80 (was 96 i think) and things
should be much easier to review now.

I've also had reports of people testing with additional games such as
Dota 2 and seeing good results.


On Tue, 2016-06-21 at 16:08 +1000, Timothy Arceri wrote:
> Rather than send 90+ patches to the list. Please see the repo at the
> bottom of this email.
> 
> The big update is I've added all stages but compute and tested with a
> few games and everything seems to be working well so far. Enabling
> shader cache with the Shadow of Mordor benchmark make things
> noticeably
> smoother and helps consitently keep the min FPS at 15 on my Skylake,
> were as without it can be anywhere between 4-15.
> 
> The elemental demo which Dave pointed out as also doing a bunch of
> compiles during the demo is also smoother especially on the second
> run
> but its really slow on my Skylake regardless. Maybe someone with a
> highend Skylake would like to give it a try.
> 
>  
> V3:
> - add support for geometry and tessellation stages
> - cache clip planes
> - reserve parameter storage before restoring list
> - stop losing  buffer blocks on cache fallback
> - lots of little fixes I cant remember
> 
> V2: 
> - rebased on master
> - add support for encoding doubles
> - renamed skip_cache params to is_cache_fallback, and fix related bug
> when
>  disabling shader cache for xfb.
> 
> This series is based on the great work done by Carl, Kristian and
> others.
> 
> I've split up Carls original patches for easier review, and also
> merged
> a number of fixes and clean-ups into his patches. However there is a
> little more code churn than is ideal as the appoach taken by the
> original patches needed to be modified quite a lot, I'm hoping its
> not
> more than people can live with as I'd like to keep some of the
> history
> rather than just squashing everything.
> 
> For now I have left in some printf's as the feature is still disabled
> by default and they are useful for debugging. I intend to fix this
> soon
> to hide them behind an environment var.
> 
> There are no regressions after two runs of piglit with shader cache
> enabled on my Broadwell machine.
> 
> This series enables on disk shader cache for all stage except compute
> programs. For now transform feedback, and SSO programs skip using the
> cache, these will be added as follow ups.
> 
> My main goal with this series is to land something that
> passes piglit there is a number of optimisations that can still be
> done
> such as skipping more validation and state recreation when falling
> back
> to a full recompile but I would rather leave this until we have
> something fully working.
> 
> Here are the shader-db times (from V2):
> 
> Cache disabled:
> 
> Thread 1 took 1360.47 seconds and compiled 13015 shaders (not
> including
> SIMD16) with 50 GL context switches
> Thread 3 took 1349.85 seconds and compiled 12848 shaders (not
> including
> SIMD16) with 40 GL context switches
> Thread 2 took 1362.94 seconds and compiled 12637 shaders (not
> including
> SIMD16) with 36 GL context switches
> Thread 0 took 1352.41 seconds and compiled 12593 shaders (not
> including
> SIMD16) with 46 GL context switches
> 
> Cache enabled first run:
> 
> Thread 1 took 1410.30 seconds and compiled 12678 shaders (not
> including
> SIMD16) with 34 GL context switches
> Thread 2 took 1421.35 seconds and compiled 12822 shaders (not
> including
> SIMD16) with 50 GL context switches
> Thread 0 took 1410.49 seconds and compiled 12999 shaders (not
> including
> SIMD16) with 40 GL context switches
> Thread 3 took 1426.67 seconds and compiled 12594 shaders (not
> including
> SIMD16) with 48 GL context switches
> 
> Cache enabled second run:
> 
> Thread 0 took 259.84 seconds and compiled 12817 shaders (not
> including
> SIMD16) with 40 GL context switches
> Thread 3 took 257.03 seconds and compiled 12533 shaders (not
> including
> SIMD16) with 50 GL context switches
> Thread 1 took 256.18 seconds and compiled 12828 shaders (not
> including
> SIMD16) with 40 GL context switches
> Thread 2 took 261.31 seconds and compiled 12915 shaders (not
> including
> SIMD16) with 39 GL context switches
> 
> You can find the series in the shader-cache branch of:
> 
> https://github.com/tarceri/Mesa_arrays_of_arrays.git
> 
> MESA_GLSL_CACHE_ENABLE=1 enables the cache.
> 
> 
> 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list