[Mesa-dev] [PATCH 1/3] cso: don't release sampler states that are bound
Nicolai Hähnle
nhaehnle at gmail.com
Wed Dec 7 10:16:59 UTC 2016
On 07.12.2016 08:50, Michel Dänzer wrote:
> On 06/12/16 10:24 PM, Marek Olšák wrote:
>> On Mon, Dec 5, 2016 at 10:05 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>> On 03/12/16 05:38 AM, Marek Olšák wrote:
>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>
>>>> This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine) and
>>>> probably many other games too.
>>>>
>>>> cso_cache deletes sampler states when the cache size is too big and doesn't
>>>> check which sampler states are bound, causing use-after-free in drivers.
>>>> Because of that, radeonsi uploaded garbage sampler states and the hardware
>>>> went bananas. Other drivers may have experienced similar issues.
>>>>
>>>> Cc: 13.0 12.0 <mesa-stable at lists.freedesktop.org>
>>>> ---
>>>> src/gallium/auxiliary/cso_cache/cso_cache.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/src/gallium/auxiliary/cso_cache/cso_cache.c b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>> index b240c93..1f3be4b 100644
>>>> --- a/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>> +++ b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>> @@ -181,21 +181,23 @@ static inline void sanitize_cb(struct cso_hash *hash, enum cso_cache_type type,
>>>> --to_remove;
>>>> }
>>>> }
>>>>
>>>> struct cso_hash_iter
>>>> cso_insert_state(struct cso_cache *sc,
>>>> unsigned hash_key, enum cso_cache_type type,
>>>> void *state)
>>>> {
>>>> struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>> - sanitize_hash(sc, hash, type, sc->max_size);
>>>> +
>>>> + if (type != CSO_SAMPLER)
>>>> + sanitize_hash(sc, hash, type, sc->max_size);
>>>>
>>>> return cso_hash_insert(hash, hash_key, state);
>>>> }
>>>>
>>>> struct cso_hash_iter
>>>> cso_find_state(struct cso_cache *sc,
>>>> unsigned hash_key, enum cso_cache_type type)
>>>> {
>>>> struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>>
>>>>
>>>
>>> If I understand correctly, this means that the maximum cache size
>>> effectively isn't enforced for sampler states? Wouldn't it be better
>>> instead to add code which prevents currently bound states from being
>>> deleted from here?
>>
>> Not in this patch. Maybe later and if CPU profiling results show that
>> it matters.
>>
>> Currently, if the cache size is exceeded, the GPU usually hangs. If
>> pruning the cache was important, most apps would hang.
>>
>> Thus, I'd like to push this as-is.
>
> Fair enough, but this will require some follow-up work:
>
> Why are sampler states singled out? The same problem can at least
> theoretically happen with other CSOs as well, right?
The other CSOs only have single bind points. So with other CSOs, what
happens is that:
1. State tracker wants to bind some state, creates the CSO.
2. cso_cache gets cleaned out; this might delete the currently bound CSO.
3. Currently bound CSO gets immediately overwritten with the new state
anyway.
Samplers are different because there are many bind points within a
single context.
You're right though that the sounder approach would be to check all CSOs
against current bind points during cache cleanup, just in case a driver
does something with the old state on unbind.
> The maximum cache size needs to be addressed, either something like
> above or by just removing the maximum cache size.
>
>
> BTW, if varying the LOD bias is common, maybe the LOD bias value should
> be tracked separately from the remaining sampler state?
It's not common at all, actually. Some apps are just weird.
> P.S. The bug has been there forever, so I don't see the justification
> for bypassing the normal stable branch process.
And we've had mysterious hang bug reports forever as well. I wouldn't be
surprised if a large number or even all of them could be traced back to
this issue.
Nicolai
More information about the mesa-dev
mailing list