[Mesa-dev] [PATCH 1/3] cso: don't release sampler states that are bound

Marek Olšák maraeo at gmail.com
Wed Dec 7 20:46:12 UTC 2016


On Wed, Dec 7, 2016 at 6:00 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 07.12.2016 um 17:26 schrieb Marek Olšák:
>> Optimizing the CSO cache isn't exactly on the top of my list, so I
>> can't really do that right now.
>>
>> I think that varying the LOD bias is starting to be common. It's used
>> for smooth LOD transitions when loading textures during rendering.
>> Games with lots of content typically do that. This particular Batman
>> game uses UE3.
> The question of course is if they do it via sampler state or via shader
> lod bias.

The sampler state. I saw those states. lod_bias was the only changing variable.

> I suppose that when these objects were designed noone thought it would
> be useful to really create sampler state with lots of different bias
> values. d3d10 of course would have the same problem (and it has limits
> on how many such objects you can create, 4096 per context) but the
> problem shifted to the app since it would have to create the objects
> explicitly - I would suspect the app there would either quantize the
> value itself, or use shader lod bias instead.

The shader lod bias isn't a better solution though. Any LOD bias is a
modifier of the varying LOD value. For texture streaming, you want to
clamp the LOD, not shift it, thus min_lod is better. However, min_lod
is integer on ATI DX9 GPUs (not sure about the API), so DX9 games
can't use it for smooth transitions. That may explain why we are
seeing it with Wine. I guess any DX10+ and GL3+ games do use min_lod
instead of lod_bias, which means we probably get a lot of sampler
states there too.

We could reduce the size of pipe_sampler_state a little. AMD GCN can
represent only 14 bits of lod_bias and 12 bits of min_lod and max_lod.

Marek

> In any case in the end you still have to program the hw differently for
> all these different bias values, albeit quantized to whatever the hw
> supports.
>
> Roland
>
>
>
>>
>> I used to play Borderlands 2 (UE3), it had smooth LOD transitions too,
>> but it never hung.
>>
>> Does this patch have your Rb?
>>
>> Thanks,
>> Marek
>>
>> On Wed, Dec 7, 2016 at 11:16 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>>> On 07.12.2016 08:50, Michel Dänzer wrote:
>>>>
>>>> On 06/12/16 10:24 PM, Marek Olšák wrote:
>>>>>
>>>>> On Mon, Dec 5, 2016 at 10:05 AM, Michel Dänzer <michel at daenzer.net>
>>>>> wrote:
>>>>>>
>>>>>> On 03/12/16 05:38 AM, Marek Olšák wrote:
>>>>>>>
>>>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>>>
>>>>>>> This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine)
>>>>>>> and
>>>>>>> probably many other games too.
>>>>>>>
>>>>>>> cso_cache deletes sampler states when the cache size is too big and
>>>>>>> doesn't
>>>>>>> check which sampler states are bound, causing use-after-free in
>>>>>>> drivers.
>>>>>>> Because of that, radeonsi uploaded garbage sampler states and the
>>>>>>> hardware
>>>>>>> went bananas. Other drivers may have experienced similar issues.
>>>>>>>
>>>>>>> Cc: 13.0 12.0 <mesa-stable at lists.freedesktop.org>
>>>>>>> ---
>>>>>>>  src/gallium/auxiliary/cso_cache/cso_cache.c | 4 +++-
>>>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>>>> b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>>>> index b240c93..1f3be4b 100644
>>>>>>> --- a/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>>>> +++ b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>>>> @@ -181,21 +181,23 @@ static inline void sanitize_cb(struct cso_hash
>>>>>>> *hash, enum cso_cache_type type,
>>>>>>>        --to_remove;
>>>>>>>     }
>>>>>>>  }
>>>>>>>
>>>>>>>  struct cso_hash_iter
>>>>>>>  cso_insert_state(struct cso_cache *sc,
>>>>>>>                   unsigned hash_key, enum cso_cache_type type,
>>>>>>>                   void *state)
>>>>>>>  {
>>>>>>>     struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>>>>> -   sanitize_hash(sc, hash, type, sc->max_size);
>>>>>>> +
>>>>>>> +   if (type != CSO_SAMPLER)
>>>>>>> +      sanitize_hash(sc, hash, type, sc->max_size);
>>>>>>>
>>>>>>>     return cso_hash_insert(hash, hash_key, state);
>>>>>>>  }
>>>>>>>
>>>>>>>  struct cso_hash_iter
>>>>>>>  cso_find_state(struct cso_cache *sc,
>>>>>>>                 unsigned hash_key, enum cso_cache_type type)
>>>>>>>  {
>>>>>>>     struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> If I understand correctly, this means that the maximum cache size
>>>>>> effectively isn't enforced for sampler states? Wouldn't it be better
>>>>>> instead to add code which prevents currently bound states from being
>>>>>> deleted from here?
>>>>>
>>>>>
>>>>> Not in this patch. Maybe later and if CPU profiling results show that
>>>>> it matters.
>>>>>
>>>>> Currently, if the cache size is exceeded, the GPU usually hangs. If
>>>>> pruning the cache was important, most apps would hang.
>>>>>
>>>>> Thus, I'd like to push this as-is.
>>>>
>>>>
>>>> Fair enough, but this will require some follow-up work:
>>>>
>>>> Why are sampler states singled out? The same problem can at least
>>>> theoretically happen with other CSOs as well, right?
>>>
>>>
>>> The other CSOs only have single bind points. So with other CSOs, what
>>> happens is that:
>>>
>>> 1. State tracker wants to bind some state, creates the CSO.
>>> 2. cso_cache gets cleaned out; this might delete the currently bound CSO.
>>> 3. Currently bound CSO gets immediately overwritten with the new state
>>> anyway.
>>>
>>> Samplers are different because there are many bind points within a single
>>> context.
>>>
>>> You're right though that the sounder approach would be to check all CSOs
>>> against current bind points during cache cleanup, just in case a driver does
>>> something with the old state on unbind.
>>>
>>>
>>>> The maximum cache size needs to be addressed, either something like
>>>> above or by just removing the maximum cache size.
>>>>
>>>>
>>>> BTW, if varying the LOD bias is common, maybe the LOD bias value should
>>>> be tracked separately from the remaining sampler state?
>>>
>>>
>>> It's not common at all, actually. Some apps are just weird.
>>>
>>>
>>>> P.S. The bug has been there forever, so I don't see the justification
>>>> for bypassing the normal stable branch process.
>>>
>>>
>>> And we've had mysterious hang bug reports forever as well. I wouldn't be
>>> surprised if a large number or even all of them could be traced back to this
>>> issue.
>>>
>>> Nicolai
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DgIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=J1-2JNe113hsUFqegKx05IaoxAonJSJznguiRxatxzQ&s=b0oZ-g2JwG6fc5W6mFVfw7x8H1GGAwkw_pKI0nQC1CY&e=
>>
>


More information about the mesa-dev mailing list