[Mesa-dev] [PATCH] radv: fix possible GPU hangs by zeroing the descriptor set

Bas Nieuwenhuizen bas at basnieuwenhuizen.nl
Fri Sep 8 14:42:04 UTC 2017


On Fri, Sep 8, 2017 at 4:30 PM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:
>
>
> On 09/08/2017 04:08 PM, Józef Kucia wrote:
>>
>> On Fri, Sep 8, 2017 at 2:43 PM, Samuel Pitoiset
>> <samuel.pitoiset at gmail.com> wrote:
>>>
>>> when an application doesn't update the contents of a descriptor
>>> set object, it contains random values and this might end up with
>>> GPU VM faults if a shader tries to access that descriptor.
>>>
>>> The Vulkan spec doesn't state what a driver should do when
>>> a descriptor is unset, but zeroing the contents seems to be
>>> a safe solution.
>>
>>
>> FWIW, this is invalid usage. The Vulkan spec states: "Descriptors in
>> each bound descriptor set, specified via vkCmdBindDescriptorSets, must
>> be
>> valid if they are statically used by the currently bound VkPipeline
>> object, specified via
>> vkCmdBindPipeline".
>
>
> Thanks for this, but it appears that some drivers do the same thing (or use
> any other default values). I think the real question here is: does this
> affect performance? If so, we should avoid it.
>
> The application probably needs a fix too, but this has the advantage to
> prevent future similar hangs.
>
> Opinions welcome!

It might be the case that this is due to memory zeroing on
acllocation. e.g. amdgpu-pro and nvidia seem to zero memory on first
allocation (i.e. when we allocate the pool here), which means that the
descriptors get zero'ed by default, (but not necessarily when we free
and reallocate descriptors in the same pool). Note that vulkan does
not require that at all.

I feel this would definitely be useful behind a debug flag for
checking if games do stupid things. I'm still conflicted about
enabling it always, and definitely in favor of pushing the app to fix
this no matter whether this gets included.

- Bas
>
>
>>
>>>
>>> This fixes a GPU hang with Unity because a pixel shader tries
>>> to access a descriptor that is unset by the application.
>>> Though, maybe that descriptor load should have been DCE'd.
>>>
>>> Cc: Pierre-Loup A. Griffais <pgriffais at valvesoftware.com>
>>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>> ---
>>>   src/amd/vulkan/radv_descriptor_set.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/src/amd/vulkan/radv_descriptor_set.c
>>> b/src/amd/vulkan/radv_descriptor_set.c
>>> index 260bfbc12b..6271f9d06e 100644
>>> --- a/src/amd/vulkan/radv_descriptor_set.c
>>> +++ b/src/amd/vulkan/radv_descriptor_set.c
>>> @@ -327,6 +327,12 @@ radv_descriptor_set_create(struct radv_device
>>> *device,
>>>                          list_add(&set->vram_list, prev);
>>>                  } else
>>>                          return
>>> vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
>>> +
>>> +               /* Initialize the descriptor set to zero in order to
>>> avoid GPU
>>> +                * VM faults when an application doesn't update the
>>> contents of
>>> +                * a descriptor set object.
>>> +                */
>>> +               memset(set->mapped_ptr, 0, layout->size);
>>>          }
>>>
>>>          for (unsigned i = 0; i < layout->binding_count; ++i) {
>>> --
>>> 2.14.1
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list