[PATCH 0/5] Add work pool to reset domain

Christian König ckoenig.leichtzumerken at gmail.com
Sat Aug 12 08:23:11 UTC 2023


Am 11.08.23 um 08:02 schrieb Lijo Lazar:
> Presently, there are multiple clients of reset like RAS, job timeout, KFD hang
> detection and debug method. Instead of each client maintaining a work item,
> reset work pool is moved to reset domain. When a client makes a recovery request,
> a work item is allocated by the reset domain and queued for execution. For the
> case of job timeout, each ring has its own TDR queue to which tdr work is
> scheduled. From there, it's further queued to a reset domain based on the device
> configuration.
>
> This allows flexibility to have multiple reset domains. For example, when
> there are partitions, each partition can maintain its own reset domain and a job
> timeout on one partition doesn't affect jobs on the other partition (when the
> jobs don't have any interdependency). The reset logic will select the
> appropriate reset domain based on the current device configuration.

Well completely NAK to that design.

We intentionally added the workqueue to serialize *all* reset work and I 
absolutely don't see any reason to change that.

Regards,
Christian.

>
> Lijo Lazar (5):
>    drm/amdgpu: Add work pool to reset domain
>    drm/amdgpu: Move to reset_schedule_work
>    drm/amdgpu: Set flags to cancel all pending resets
>    drm/amdgpu: Add API to queue and do reset work
>    drm/amdgpu: Add TDR queue for ring
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  32 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |   1 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  24 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  40 +++----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  16 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  71 ++++++------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 122 ++++++++++++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  32 +++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |   5 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |   1 -
>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c      |  38 +++----
>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c      |  44 ++++----
>   drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c      |  33 +++---
>   15 files changed, 285 insertions(+), 177 deletions(-)
>



More information about the amd-gfx mailing list