[Mesa-dev] [PATCH] i965: Enable disk shader cache by default

Timothy Arceri tarceri at itsqueeze.com
Tue Dec 5 23:19:00 UTC 2017


On 06/12/17 09:49, Mark Janes wrote:
> Jordan Justen <jordan.l.justen at intel.com> writes:
> 
>> On 2017-12-05 09:13:11, Mark Janes wrote:
>>> Jordan Justen <jordan.l.justen at intel.com> writes:
>>>
>>>> On 2017-11-08 17:26:47, Timothy Arceri wrote:
>>>>> Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
>>>>>
>>>>> Mark may want to consider adding some of the once a day type CI runs for
>>>>> this. For example running the test suite for two consecutive runs on the
>>>>> same build so that the second run uses the shader cache and also a
>>>>> second run the uses MESA_GLSL=cache_fb to force testing of the cache
>>>>> fallback path.
>>>>
>>>> Yeah. We discussed this previously, but I don't think it's been
>>>> implemented yet. My opinion is that it could perhaps be a weekly test.
>>>
>>> This automation is implemented now. It found the issues reported in
>>> https://bugs.freedesktop.org/show_bug.cgi?id=103988
>>>
>>> My opinion is that this test strategy is inadequate.
>>
>> Meaning you think we cannot enable i965 shader cache for the 18.0
>> release?
> 
> I haven't heard anyone express confidence that this feature is bug-free,
> and I don't know on what basis that claim could be made.  I appreciate
> that a lot of have manual testing has been done.  Do you feel confident
> that the cache can be enabled for all mesa customers without disruption?
> 

I would suggest enabling it now in master it can be switched off again 
before release. For what its worth besides 1 or 2 dev branch related 
issues we still haven't had any shader cache related regressions 
reported for radeonsi (or any of the other gallium drivers with shader 
cache) and its been enabled for a couple of releases now.


>> We are already ~1.5 months away from the next stable branch point. If
>> we want to enable this in 18.0, we should be using this time to see if
>> enabling the cache by default has some large unexpected side effect in
>> our graphics stack, or breaks real-world apps.
> 
> I agree.  We should encourage as many users as possible to enable the
> shader cache in their environments.  At least one stable release should
> be out with a working cache, where the feature can be enabled by those
> who are eager to benefit from it.  I assume there will be a lot of them,
> and they could flush out issues for everyone who has no idea what a
> shader cache is.
> 
>>> Adding a dimension to the test matrix has high cost, especially when
>>> combined with other dimensions of the test matrix (does shader cache
>>> need to be tested for 32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).
>>
>> Are you saying this is too high cost to run per check-in? Do you need
>> to disable it for the health of CI? I think I proposed that daily, or
>> perhaps even weekly would be adequate.
> 
> Certainly, the test time per line of shader cache code is massively
> higher than any other feature, even if you run it just once a month.
> Other features have tests that run in milliseconds, not 30min * 20
> machines.
> 
> Beyond poor return on execution time, there is the simple fact that
> whoever is running the CI needs to manually look at shader cache results
> separately from the rest of the tests.  Unit testing is effective
> because coverage can be added at approximately zero marginal cost.
> 
> 3 years from now, will we still be looking at separate weekly shader
> cache test runs to make sure it's working?
> 
>> These tests are already identifying some corner case issues. I'm not
>> sure these would have impacted real-world apps yet, but I do think it
>> is a good idea to run these tests regularly.
>>
>> You say this test strategy is inadequate. Perhaps. I do think it needs
>> to be part of our test strategy. There is no way we are going to hit
>> all the corners of the API better than running all of our tests with
>> the cache enabled. Do you agree?
> 
> Tests should strive to cover the implementation, not the API.  Shader
> cache is a unique feature, because it affects a large part of the API.
> It doesn't necessarily follow that covering the API will cover the
> feature, or that that is an effective test strategy.
> 
> Knowing nothing of the implementation, I would look to similar projects
> (eg: ccache) to see what test strategies they employ.  There are
> probably issues they've encountered that we have not thought of.
> 
> Another approach would be to write real unit tests for bugs found during
> the course of the shader cache implementation.  Those test cases would
> help developers know that they broke a cache feature without having to
> rely on a weekly CI run.

The problem is these tests would just be.

1. Compile shaders
2. Run program
3. Recompile all the same shaders from Step 1
4. Re-run the program in the same was as Step 2

Shader-cache is not like other features in that you need to exercise 
pretty much ever other feature (hence re-running piglit) in order to be 
confident you have full coverage.

> 
>>> Since 103988 is not hardware specific, and is not a regression, it's not
>>> something that could only have been caught by CI.  I'm surprised to find
>>> it at this point, considering piglit was used to validate the feature.
>>>
>>> I'd be happy if there was at least a smoke test where complex programs
>>> are cached, with the intent of exercising the shader cache mechanism
>>> directly.  It doesn't have to be exhaustive to be effective.  Seems like
>>> a good idea to at least directly test the issue that was fixed in
>>> 103988 tests.
>>
>> It could be interesting to define a MESA extension to control or query
>> the shader cache. Today, the shader cache is a nearly 'invisible'
>> feature. There are a few env-vars, but I wouldn't call a public
>> interface.
>>
>> The downside of defining an extension is that it might constrain
>> changes to the shader cache implementation. Or, if the interface is
>> too complex, then it may be tough for some drivers to implement.
> 
> Why is an extension necessary?  By comparison, GCC has no interface to
> query the statistics for ccache.  A utility that can read the cache
> files and report hits/misses would at least inform us that the cache is
> functioning.  The utility would be useful in writing real unit tests for
> the feature.

Ignoring the extension. Testing hit/miss only tests a small part of the 
cache, and is already covered by the 'make check' tests. The real test 
is to make sure we cached/restored things correctly.

> 
>> But, what if we get around to defining and implementing this extenion,
>> (and a few piglit tests that makes use of it), by mid-January? It'll
>> still be pretty difficult to argue we should enable the shader cache
>> on i965 for 18.0, since we burned all the time to let it be tested on
>> master against real-world apps.
> 
> Personally, I think it is already difficult to argue that the cache
> should be on by default in 18.0.
> 
>> Also, while you say our current test strategy is inadequate, I do
>> think it is still far more comprehensive than anything we will
>> accomplish by defining a new extension and writing a few, or even a
>> few hundred tests against the new extension.
> 
> In the short term, we should do whatever is expedient to try to test the
> feature as much as we can.
> 
> In the longer term, a weekly test is going to be costly and ineffective
> at preventing regressions.  Tests need to be runnable for developers to
> use them as part of their process.  If a test can only be run at great
> expense in a weekly CI, then any fragility in the feature will be
> discovered a week after a bug has been pushed to master.

Unfortunately I still haven't seen any better suggestions :(

> 
>> -Jordan


More information about the mesa-dev mailing list