[Mesa-dev] [PATCH] radv: fix possible GPU hangs by zeroing the descriptor set
Nicolai Hähnle
nhaehnle at gmail.com
Wed Sep 13 15:10:55 UTC 2017
On 08.09.2017 16:42, Bas Nieuwenhuizen wrote:
> On Fri, Sep 8, 2017 at 4:30 PM, Samuel Pitoiset
> <samuel.pitoiset at gmail.com> wrote:
>>
>>
>> On 09/08/2017 04:08 PM, Józef Kucia wrote:
>>>
>>> On Fri, Sep 8, 2017 at 2:43 PM, Samuel Pitoiset
>>> <samuel.pitoiset at gmail.com> wrote:
>>>>
>>>> when an application doesn't update the contents of a descriptor
>>>> set object, it contains random values and this might end up with
>>>> GPU VM faults if a shader tries to access that descriptor.
>>>>
>>>> The Vulkan spec doesn't state what a driver should do when
>>>> a descriptor is unset, but zeroing the contents seems to be
>>>> a safe solution.
>>>
>>>
>>> FWIW, this is invalid usage. The Vulkan spec states: "Descriptors in
>>> each bound descriptor set, specified via vkCmdBindDescriptorSets, must
>>> be
>>> valid if they are statically used by the currently bound VkPipeline
>>> object, specified via
>>> vkCmdBindPipeline".
>>
>>
>> Thanks for this, but it appears that some drivers do the same thing (or use
>> any other default values). I think the real question here is: does this
>> affect performance? If so, we should avoid it.
>>
>> The application probably needs a fix too, but this has the advantage to
>> prevent future similar hangs.
>>
>> Opinions welcome!
>
> It might be the case that this is due to memory zeroing on
> acllocation. e.g. amdgpu-pro and nvidia seem to zero memory on first
> allocation (i.e. when we allocate the pool here), which means that the
> descriptors get zero'ed by default, (but not necessarily when we free
> and reallocate descriptors in the same pool). Note that vulkan does
> not require that at all.
>
> I feel this would definitely be useful behind a debug flag for
> checking if games do stupid things. I'm still conflicted about
> enabling it always, and definitely in favor of pushing the app to fix
> this no matter whether this gets included.
Probably push for a validation layer thingie as well....
Cheers,
Nicolai
>
> - Bas
>>
>>
>>>
>>>>
>>>> This fixes a GPU hang with Unity because a pixel shader tries
>>>> to access a descriptor that is unset by the application.
>>>> Though, maybe that descriptor load should have been DCE'd.
>>>>
>>>> Cc: Pierre-Loup A. Griffais <pgriffais at valvesoftware.com>
>>>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>>> ---
>>>> src/amd/vulkan/radv_descriptor_set.c | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/src/amd/vulkan/radv_descriptor_set.c
>>>> b/src/amd/vulkan/radv_descriptor_set.c
>>>> index 260bfbc12b..6271f9d06e 100644
>>>> --- a/src/amd/vulkan/radv_descriptor_set.c
>>>> +++ b/src/amd/vulkan/radv_descriptor_set.c
>>>> @@ -327,6 +327,12 @@ radv_descriptor_set_create(struct radv_device
>>>> *device,
>>>> list_add(&set->vram_list, prev);
>>>> } else
>>>> return
>>>> vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
>>>> +
>>>> + /* Initialize the descriptor set to zero in order to
>>>> avoid GPU
>>>> + * VM faults when an application doesn't update the
>>>> contents of
>>>> + * a descriptor set object.
>>>> + */
>>>> + memset(set->mapped_ptr, 0, layout->size);
>>>> }
>>>>
>>>> for (unsigned i = 0; i < layout->binding_count; ++i) {
>>>> --
>>>> 2.14.1
>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list