[Mesa-dev] V4 On disk shader cache for i965

Timothy Arceri timothy.arceri at collabora.com
Wed Jul 13 02:46:55 UTC 2016


Big thanks to Grazvydas Ignotas for helping test this version. 

V4:
- lots of reworking patches to remove code churn should be much nicer now
- fixed fallback when shader has been detached
- fixed a couple of bugs with UBOs
- no more printfs, debug info is behind an environment var
- various cleanups, tweaks and fixes
 
V3:
- add support for geometry and tessellation stages
- cache clip planes
- reserve parameter storage before restoring list
- stop losing  buffer blocks on cache fallback
- lots of little fixes I cant remember

V2: 
- rebased on master
- add support for encoding doubles
- renamed skip_cache params to is_cache_fallback, and fix related bug
 when  disabling shader cache for xfb.

This series is based on the great work done by Carl, Kristian and
others.

There are no regressions after two runs of piglit with shader cache
enabled on my Broadwell machine.

This series enables on disk shader cache for all stage except compute
programs. For now transform feedback, and SSO programs skip using the
cache, these will be added as follow ups.

My main goal with this series is to land something that
passes piglit there is a number of optimisations that can still be done
such as skipping more validation and state recreation when falling back
to a full recompile but I would rather leave this until we have
something fully working.

Games:

Enabling shader cache with the Shadow of Mordor benchmark make things noticeably
smoother and helps consitently keep the min FPS at 15 on my Skylake,
were as without it can be anywhere between 4-15.

The elemental demo which Dave pointed out as also doing a bunch of
compiles during the demo is also smoother especially on the second run
but its really slow on my Skylake regardless. Maybe someone with a
highend Skylake would like to give it a try.


Here are the shader-db times (from V2):

Cache disabled:

Thread 1 took 1360.47 seconds and compiled 13015 shaders (not including
SIMD16) with 50 GL context switches
Thread 3 took 1349.85 seconds and compiled 12848 shaders (not including
SIMD16) with 40 GL context switches
Thread 2 took 1362.94 seconds and compiled 12637 shaders (not including
SIMD16) with 36 GL context switches
Thread 0 took 1352.41 seconds and compiled 12593 shaders (not including
SIMD16) with 46 GL context switches

Cache enabled first run:

Thread 1 took 1410.30 seconds and compiled 12678 shaders (not including
SIMD16) with 34 GL context switches
Thread 2 took 1421.35 seconds and compiled 12822 shaders (not including
SIMD16) with 50 GL context switches
Thread 0 took 1410.49 seconds and compiled 12999 shaders (not including
SIMD16) with 40 GL context switches
Thread 3 took 1426.67 seconds and compiled 12594 shaders (not including
SIMD16) with 48 GL context switches

Cache enabled second run:

Thread 0 took 259.84 seconds and compiled 12817 shaders (not including
SIMD16) with 40 GL context switches
Thread 3 took 257.03 seconds and compiled 12533 shaders (not including
SIMD16) with 50 GL context switches
Thread 1 took 256.18 seconds and compiled 12828 shaders (not including
SIMD16) with 40 GL context switches
Thread 2 took 261.31 seconds and compiled 12915 shaders (not including
SIMD16) with 39 GL context switches

You can find the series in the shader-cache branch of:

https://github.com/tarceri/Mesa_arrays_of_arrays.git

MESA_GLSL_CACHE_ENABLE=1 - enables the cache.
MESA_GLSL=cache_info - enables some debug messages



More information about the mesa-dev mailing list