[RFC PATCH v3 0/4] drm: Standardize device reset notification

Christian König christian.koenig at amd.com
Wed Jun 21 15:09:14 UTC 2023


Am 21.06.23 um 17:06 schrieb André Almeida:
> Em 21/06/2023 04:42, Christian König escreveu:
>> Am 21.06.23 um 02:57 schrieb André Almeida:
>>> Hi,
>>>
>>> This is a new version of the documentation for DRM device resets. As 
>>> I dived
>>> more in the subject, I started to believe that part of the problem 
>>> was the lack
>>> of a DRM API to get reset information from the driver. With an API, 
>>> we can
>>> better standardize reset queries, increase common code from both DRM 
>>> and Mesa,
>>> and make easier to write end-to-end tests.
>>>
>>> So this patchset, along with the documentation, comes with a new 
>>> IOCTL and two
>>> implementations of it for amdgpu and i915 (although just the former 
>>> was really
>>> tested). This IOCTL uses the "context id" to query reset 
>>> information, but this
>>> might be not generic enough to be included in a DRM API.
>>
>> Well the basic problem with that is that we don't have a standard DRM 
>> context defined.
>>
>> If you want to do this you should probably start there first.
>
> Any idea on how to start this? I tried to find previous work about 
> that, but I didn't find.

I'm not aware of any work in this area, maybe ping on the Mesa list as well.

Could be that someone looked into that but never send anything out.

>
>>
>> Apart from that this looks like a really really good idea to me, 
>> especially that we document the reset expectations.
>
> I think I'll submit just the doc for the next version then, given that 
> the IOCTL will need a lot of rework.

Yeah, agree completely.

Thanks,
Christian.

>
>>
>> Regards,
>> Christian.
>>
>>>    At least for amdgpu,
>>> this information is encapsulated by libdrm so one can't just call 
>>> the ioctl
>>> directly from the UMD as I was planning to, but a small refactor can 
>>> be done to
>>> expose the id. Anyway, I'm sharing it as it is to gather feedback if 
>>> this seems
>>> to work.
>>>
>>> The amdgpu and i915 implementations are provided as a mean of 
>>> testing and as
>>> exemplification, and not as reference code yet, as the goal is more 
>>> about the
>>> interface itself then the driver parts.
>>>
>>> For the documentation itself, after spending some time reading the 
>>> reset path in
>>> the kernel in Mesa, I decide to rewrite it to better reflect how it 
>>> works, from
>>> bottom to top.
>>>
>>> You can check the userspace side of the IOCLT here:
>>>   Mesa: 
>>> https://gitlab.freedesktop.org/andrealmeid/mesa/-/commit/cd687b22fb32c21b23596c607003e2a495f465
>>>   libdrm: 
>>> https://gitlab.freedesktop.org/andrealmeid/libdrm/-/commit/b31e5404893ee9a85d1aa67e81c2f58c1dac3c46
>>>
>>> For testing, I use this vulkan app that has an infinity loop in the 
>>> shader:
>>> https://github.com/andrealmeid/vulkan-triangle-v1
>>>
>>> Feedbacks are welcomed!
>>>
>>> Thanks,
>>>         André
>>>
>>> v2: 
>>> https://lore.kernel.org/all/20230227204000.56787-1-andrealmeid@igalia.com/
>>> v1: 
>>> https://lore.kernel.org/all/20230123202646.356592-1-andrealmeid@igalia.com/
>>>
>>> André Almeida (4):
>>>    drm/doc: Document DRM device reset expectations
>>>    drm: Create DRM_IOCTL_GET_RESET
>>>    drm/amdgpu: Implement DRM_IOCTL_GET_RESET
>>>    drm/i915: Implement DRM_IOCTL_GET_RESET
>>>
>>>   Documentation/gpu/drm-uapi.rst                | 51 ++++++++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 35 +++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h       |  5 ++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       | 12 +++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h       |  2 +
>>>   drivers/gpu/drm/drm_debugfs.c                 |  2 +
>>>   drivers/gpu/drm/drm_ioctl.c                   | 58 
>>> +++++++++++++++++++
>>>   drivers/gpu/drm/i915/gem/i915_gem_context.c   | 18 ++++++
>>>   drivers/gpu/drm/i915/gem/i915_gem_context.h   |  2 +
>>>   .../gpu/drm/i915/gem/i915_gem_context_types.h |  2 +
>>>   drivers/gpu/drm/i915/i915_driver.c            |  2 +
>>>   include/drm/drm_device.h                      |  3 +
>>>   include/drm/drm_drv.h                         |  3 +
>>>   include/uapi/drm/drm.h                        | 21 +++++++
>>>   include/uapi/drm/drm_mode.h                   | 15 +++++
>>>   17 files changed, 233 insertions(+), 3 deletions(-)
>>>
>>



More information about the amd-gfx mailing list