[Mesa-dev] [PATCH 1/3] cso: don't release sampler states that are bound

Wed Dec 7 21:06:24 UTC 2016

Am 07.12.2016 um 21:46 schrieb Marek Olšák:
> On Wed, Dec 7, 2016 at 6:00 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 07.12.2016 um 17:26 schrieb Marek Olšák:
>>> Optimizing the CSO cache isn't exactly on the top of my list, so I
>>> can't really do that right now.
>>>
>>> I think that varying the LOD bias is starting to be common. It's used
>>> for smooth LOD transitions when loading textures during rendering.
>>> Games with lots of content typically do that. This particular Batman
>>> game uses UE3.
>> The question of course is if they do it via sampler state or via shader
>> lod bias.
> 
> The sampler state. I saw those states. lod_bias was the only changing variable.
I meant in general, not for this particular app.

> 
>> I suppose that when these objects were designed noone thought it would
>> be useful to really create sampler state with lots of different bias
>> values. d3d10 of course would have the same problem (and it has limits
>> on how many such objects you can create, 4096 per context) but the
>> problem shifted to the app since it would have to create the objects
>> explicitly - I would suspect the app there would either quantize the
>> value itself, or use shader lod bias instead.
> 
> The shader lod bias isn't a better solution though. Any LOD bias is a
> modifier of the varying LOD value. For texture streaming, you want to
> clamp the LOD, not shift it, thus min_lod is better. However, min_lod
> is integer on ATI DX9 GPUs (not sure about the API), so DX9 games
> can't use it for smooth transitions. That may explain why we are
> seeing it with Wine. I guess any DX10+ and GL3+ games do use min_lod
> instead of lod_bias, which means we probably get a lot of sampler
> states there too.
Oh I missed this was dx9. DX9 otherwise only has D3DSAMP_MAXMIPLEVEL
which is integer, equivalent to gl's base level (restricting lod range
directly is impossible, albeit I suppose you might translate that
MAXMIPLEVEL to min_lod without texture views).

> 
> We could reduce the size of pipe_sampler_state a little. AMD GCN can
> represent only 14 bits of lod_bias and 12 bits of min_lod and max_lod.
So min_lod and max_lod only has 6 fractional bits whereas lod_bias has 8?
In any case even 12 bits is quite a lot of objects you could possibly
have...
I'm not sure though if using some fixed-point integer in
pipe_sampler_state really makes a lot of sense.

Roland


> 
> Marek
> 
>> In any case in the end you still have to program the hw differently for
>> all these different bias values, albeit quantized to whatever the hw
>> supports.
>>
>> Roland
>>
>>
>>
>>>
>>> I used to play Borderlands 2 (UE3), it had smooth LOD transitions too,
>>> but it never hung.
>>>
>>> Does this patch have your Rb?
>>>
>>> Thanks,
>>> Marek
>>>
>>> On Wed, Dec 7, 2016 at 11:16 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>>>> On 07.12.2016 08:50, Michel Dänzer wrote:
>>>>>
>>>>> On 06/12/16 10:24 PM, Marek Olšák wrote:
>>>>>>
>>>>>> On Mon, Dec 5, 2016 at 10:05 AM, Michel Dänzer <michel at daenzer.net>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 03/12/16 05:38 AM, Marek Olšák wrote:
>>>>>>>>
>>>>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>>>>
>>>>>>>> This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine)
>>>>>>>> and
>>>>>>>> probably many other games too.
>>>>>>>>
>>>>>>>> cso_cache deletes sampler states when the cache size is too big and
>>>>>>>> doesn't
>>>>>>>> check which sampler states are bound, causing use-after-free in
>>>>>>>> drivers.
>>>>>>>> Because of that, radeonsi uploaded garbage sampler states and the
>>>>>>>> hardware
>>>>>>>> went bananas. Other drivers may have experienced similar issues.
>>>>>>>>
>>>>>>>> Cc: 13.0 12.0 <mesa-stable at lists.freedesktop.org>
>>>>>>>> ---
>>>>>>>>  src/gallium/auxiliary/cso_cache/cso_cache.c | 4 +++-
>>>>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>>>>> b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>>>>> index b240c93..1f3be4b 100644
>>>>>>>> --- a/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>>>>> +++ b/src/gallium/auxiliary/cso_cache/cso_cache.c
>>>>>>>> @@ -181,21 +181,23 @@ static inline void sanitize_cb(struct cso_hash
>>>>>>>> *hash, enum cso_cache_type type,
>>>>>>>>        --to_remove;
>>>>>>>>     }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  struct cso_hash_iter
>>>>>>>>  cso_insert_state(struct cso_cache *sc,
>>>>>>>>                   unsigned hash_key, enum cso_cache_type type,
>>>>>>>>                   void *state)
>>>>>>>>  {
>>>>>>>>     struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>>>>>> -   sanitize_hash(sc, hash, type, sc->max_size);
>>>>>>>> +
>>>>>>>> +   if (type != CSO_SAMPLER)
>>>>>>>> +      sanitize_hash(sc, hash, type, sc->max_size);
>>>>>>>>
>>>>>>>>     return cso_hash_insert(hash, hash_key, state);
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  struct cso_hash_iter
>>>>>>>>  cso_find_state(struct cso_cache *sc,
>>>>>>>>                 unsigned hash_key, enum cso_cache_type type)
>>>>>>>>  {
>>>>>>>>     struct cso_hash *hash = _cso_hash_for_type(sc, type);
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> If I understand correctly, this means that the maximum cache size
>>>>>>> effectively isn't enforced for sampler states? Wouldn't it be better
>>>>>>> instead to add code which prevents currently bound states from being
>>>>>>> deleted from here?
>>>>>>
>>>>>>
>>>>>> Not in this patch. Maybe later and if CPU profiling results show that
>>>>>> it matters.
>>>>>>
>>>>>> Currently, if the cache size is exceeded, the GPU usually hangs. If
>>>>>> pruning the cache was important, most apps would hang.
>>>>>>
>>>>>> Thus, I'd like to push this as-is.
>>>>>
>>>>>
>>>>> Fair enough, but this will require some follow-up work:
>>>>>
>>>>> Why are sampler states singled out? The same problem can at least
>>>>> theoretically happen with other CSOs as well, right?
>>>>
>>>>
>>>> The other CSOs only have single bind points. So with other CSOs, what
>>>> happens is that:
>>>>
>>>> 1. State tracker wants to bind some state, creates the CSO.
>>>> 2. cso_cache gets cleaned out; this might delete the currently bound CSO.
>>>> 3. Currently bound CSO gets immediately overwritten with the new state
>>>> anyway.
>>>>
>>>> Samplers are different because there are many bind points within a single
>>>> context.
>>>>
>>>> You're right though that the sounder approach would be to check all CSOs
>>>> against current bind points during cache cleanup, just in case a driver does
>>>> something with the old state on unbind.
>>>>
>>>>
>>>>> The maximum cache size needs to be addressed, either something like
>>>>> above or by just removing the maximum cache size.
>>>>>
>>>>>
>>>>> BTW, if varying the LOD bias is common, maybe the LOD bias value should
>>>>> be tracked separately from the remaining sampler state?
>>>>
>>>>
>>>> It's not common at all, actually. Some apps are just weird.
>>>>
>>>>
>>>>> P.S. The bug has been there forever, so I don't see the justification
>>>>> for bypassing the normal stable branch process.
>>>>
>>>>
>>>> And we've had mysterious hang bug reports forever as well. I wouldn't be
>>>> surprised if a large number or even all of them could be traced back to this
>>>> issue.
>>>>
>>>> Nicolai
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DgIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=J1-2JNe113hsUFqegKx05IaoxAonJSJznguiRxatxzQ&s=b0oZ-g2JwG6fc5W6mFVfw7x8H1GGAwkw_pKI0nQC1CY&e=
>>>
>>