[Mesa-dev] [PATCH 1/3] cso: don't release sampler states that are bound

Nicolai Hähnle nhaehnle at gmail.com
Wed Dec 7 10:16:59 UTC 2016


On 07.12.2016 08:50, Michel Dänzer wrote:
> On 06/12/16 10:24 PM, Marek Olšák wrote:
>> On Mon, Dec 5, 2016 at 10:05 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>> On 03/12/16 05:38 AM, Marek Olšák wrote:
>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>
>>>> This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine) and
>>>> probably many other games too.
>>>>
>>>> cso_cache deletes sampler states when the cache size is too big and doesn't
>>>> check which sampler states are bound, causing use-after-free in drivers.
>>>> Because of that, radeonsi uploaded garbage sampler states and the hardware
>>>> went bananas. Other drivers may have experienced similar issues.
>>>>
>>>> Cc: 13.0 12.0 <mesa-stable at lists.freedesktop.org>
>>>> ---
>>>>  src/gallium/auxiliary/cso_cache/cso_cache.c | 4 +++-
>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/src/gallium/auxiliary/cso_cache/cso_cache.c b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>> index b240c93..1f3be4b 100644
>>>> --- a/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>> +++ b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>> @@ -181,21 +181,23 @@ static inline void sanitize_cb(struct cso_hash *hash, enum cso_cache_type type,
>>>>        --to_remove;
>>>>     }
>>>>  }
>>>>
>>>>  struct cso_hash_iter
>>>>  cso_insert_state(struct cso_cache *sc,
>>>>                   unsigned hash_key, enum cso_cache_type type,
>>>>                   void *state)
>>>>  {
>>>>     struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>> -   sanitize_hash(sc, hash, type, sc->max_size);
>>>> +
>>>> +   if (type != CSO_SAMPLER)
>>>> +      sanitize_hash(sc, hash, type, sc->max_size);
>>>>
>>>>     return cso_hash_insert(hash, hash_key, state);
>>>>  }
>>>>
>>>>  struct cso_hash_iter
>>>>  cso_find_state(struct cso_cache *sc,
>>>>                 unsigned hash_key, enum cso_cache_type type)
>>>>  {
>>>>     struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>>
>>>>
>>>
>>> If I understand correctly, this means that the maximum cache size
>>> effectively isn't enforced for sampler states? Wouldn't it be better
>>> instead to add code which prevents currently bound states from being
>>> deleted from here?
>>
>> Not in this patch. Maybe later and if CPU profiling results show that
>> it matters.
>>
>> Currently, if the cache size is exceeded, the GPU usually hangs. If
>> pruning the cache was important, most apps would hang.
>>
>> Thus, I'd like to push this as-is.
>
> Fair enough, but this will require some follow-up work:
>
> Why are sampler states singled out? The same problem can at least
> theoretically happen with other CSOs as well, right?

The other CSOs only have single bind points. So with other CSOs, what 
happens is that:

1. State tracker wants to bind some state, creates the CSO.
2. cso_cache gets cleaned out; this might delete the currently bound CSO.
3. Currently bound CSO gets immediately overwritten with the new state 
anyway.

Samplers are different because there are many bind points within a 
single context.

You're right though that the sounder approach would be to check all CSOs 
against current bind points during cache cleanup, just in case a driver 
does something with the old state on unbind.


> The maximum cache size needs to be addressed, either something like
> above or by just removing the maximum cache size.
>
>
> BTW, if varying the LOD bias is common, maybe the LOD bias value should
> be tracked separately from the remaining sampler state?

It's not common at all, actually. Some apps are just weird.


> P.S. The bug has been there forever, so I don't see the justification
> for bypassing the normal stable branch process.

And we've had mysterious hang bug reports forever as well. I wouldn't be 
surprised if a large number or even all of them could be traced back to 
this issue.

Nicolai


More information about the mesa-dev mailing list