[Mesa-dev] [PATCH 1/3] cso: don't release sampler states that are bound

Michel Dänzer michel at daenzer.net
Thu Dec 8 01:35:59 UTC 2016


On 07/12/16 07:16 PM, Nicolai Hähnle wrote:
> On 07.12.2016 08:50, Michel Dänzer wrote:
>> On 06/12/16 10:24 PM, Marek Olšák wrote:
>>> On Mon, Dec 5, 2016 at 10:05 AM, Michel Dänzer <michel at daenzer.net>
>>> wrote:
>>>> On 03/12/16 05:38 AM, Marek Olšák wrote:
>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>
>>>>> This fixes random radeonsi GPU hangs in Batman Arkham: Origins
>>>>> (Wine) and
>>>>> probably many other games too.
>>>>>
>>>>> cso_cache deletes sampler states when the cache size is too big and
>>>>> doesn't
>>>>> check which sampler states are bound, causing use-after-free in
>>>>> drivers.
>>>>> Because of that, radeonsi uploaded garbage sampler states and the
>>>>> hardware
>>>>> went bananas. Other drivers may have experienced similar issues.
>>>>>
>>>>> Cc: 13.0 12.0 <mesa-stable at lists.freedesktop.org>
>>>>> ---
>>>>>  src/gallium/auxiliary/cso_cache/cso_cache.c | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>> b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>> index b240c93..1f3be4b 100644
>>>>> --- a/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>> +++ b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>> @@ -181,21 +181,23 @@ static inline void sanitize_cb(struct
>>>>> cso_hash *hash, enum cso_cache_type type,
>>>>>        --to_remove;
>>>>>     }
>>>>>  }
>>>>>
>>>>>  struct cso_hash_iter
>>>>>  cso_insert_state(struct cso_cache *sc,
>>>>>                   unsigned hash_key, enum cso_cache_type type,
>>>>>                   void *state)
>>>>>  {
>>>>>     struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>>> -   sanitize_hash(sc, hash, type, sc->max_size);
>>>>> +
>>>>> +   if (type != CSO_SAMPLER)
>>>>> +      sanitize_hash(sc, hash, type, sc->max_size);
>>>>>
>>>>>     return cso_hash_insert(hash, hash_key, state);
>>>>>  }
>>>>>
>>>>>  struct cso_hash_iter
>>>>>  cso_find_state(struct cso_cache *sc,
>>>>>                 unsigned hash_key, enum cso_cache_type type)
>>>>>  {
>>>>>     struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>>>
>>>>>
>>>>
>>>> If I understand correctly, this means that the maximum cache size
>>>> effectively isn't enforced for sampler states? Wouldn't it be better
>>>> instead to add code which prevents currently bound states from being
>>>> deleted from here?
>>>
>>> Not in this patch. Maybe later and if CPU profiling results show that
>>> it matters.
>>>
>>> Currently, if the cache size is exceeded, the GPU usually hangs. If
>>> pruning the cache was important, most apps would hang.
>>>
>>> Thus, I'd like to push this as-is.
>>
>> Fair enough, but this will require some follow-up work:
>>
>> Why are sampler states singled out? The same problem can at least
>> theoretically happen with other CSOs as well, right?
> 
> The other CSOs only have single bind points. So with other CSOs, what
> happens is that:
> 
> 1. State tracker wants to bind some state, creates the CSO.
> 2. cso_cache gets cleaned out; this might delete the currently bound CSO.
> 3. Currently bound CSO gets immediately overwritten with the new state
> anyway.
> 
> Samplers are different because there are many bind points within a
> single context.
> 
> You're right though that the sounder approach would be to check all CSOs
> against current bind points during cache cleanup, just in case a driver
> does something with the old state on unbind.

That does seem rather unlikely though. Thanks for clarifying this.


>> P.S. The bug has been there forever, so I don't see the justification
>> for bypassing the normal stable branch process.
> 
> And we've had mysterious hang bug reports forever as well. I wouldn't be
> surprised if a large number or even all of them could be traced back to
> this issue.

If I had JPY 100 for every time I had such hope for a fix... :) I really
hope this is the one, but experience suggests it's more likely that
there are lots of different issues causing GPU hangs.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the mesa-dev mailing list