[PATCH v5 1/1] drm/doc: Document DRM device reset expectations
André Almeida
andrealmeid at igalia.com
Thu Jun 29 13:11:06 UTC 2023
Em 27/06/2023 18:17, André Almeida escreveu:
> Em 27/06/2023 14:47, Christian König escreveu:
>> Am 27.06.23 um 15:23 schrieb André Almeida:
>>> Create a section that specifies how to deal with DRM device resets for
>>> kernel and userspace drivers.
>>>
>>> Acked-by: Pekka Paalanen <pekka.paalanen at collabora.com>
>>> Signed-off-by: André Almeida <andrealmeid at igalia.com>
>>> ---
>>>
>>> v4:
>>> https://lore.kernel.org/lkml/20230626183347.55118-1-andrealmeid@igalia.com/
>>>
>>> Changes:
>>> - Grammar fixes (Randy)
>>>
>>> Documentation/gpu/drm-uapi.rst | 68 ++++++++++++++++++++++++++++++++++
>>> 1 file changed, 68 insertions(+)
>>>
>>> diff --git a/Documentation/gpu/drm-uapi.rst
>>> b/Documentation/gpu/drm-uapi.rst
>>> index 65fb3036a580..3cbffa25ed93 100644
>>> --- a/Documentation/gpu/drm-uapi.rst
>>> +++ b/Documentation/gpu/drm-uapi.rst
>>> @@ -285,6 +285,74 @@ for GPU1 and GPU2 from different vendors, and a
>>> third handler for
>>> mmapped regular files. Threads cause additional pain with signal
>>> handling as well.
>>> +Device reset
>>> +============
>>> +
>>> +The GPU stack is really complex and is prone to errors, from
>>> hardware bugs,
>>> +faulty applications and everything in between the many layers. Some
>>> errors
>>> +require resetting the device in order to make the device usable
>>> again. This
>>> +sections describes the expectations for DRM and usermode drivers when a
>>> +device resets and how to propagate the reset status.
>>> +
>>> +Kernel Mode Driver
>>> +------------------
>>> +
>>> +The KMD is responsible for checking if the device needs a reset, and
>>> to perform
>>> +it as needed. Usually a hang is detected when a job gets stuck
>>> executing. KMD
>>> +should keep track of resets, because userspace can query any time
>>> about the
>>> +reset stats for an specific context.
>>
>> Maybe drop the part "for a specific context". Essentially the reset
>> query could use global counters instead and we won't need the context
>> any more here.
>>
>
> Right, I wrote like this to reflect how it's currently implemented.
>
> If follow correctly what you meant, KMD could always notify the global
> count for UMD, and we would move to the UMD the responsibility to manage
> the reset counters, right? This would also simplify my
> DRM_IOCTL_GET_RESET proposal. I'll apply your suggestion to the next doc
> version.
>
Actually, if we drop the context identifier we would lose the ability to
track which is the guilty context. Vulkan API doesn't seem to care about
this, but OpenGL does.
>> Apart from that this sounds good to me, feel free to add my rb.
>>
>> Regards,
>> Christian.
>>
>>
More information about the amd-gfx
mailing list