[PATCH 0/5] Add work pool to reset domain
Christian König
ckoenig.leichtzumerken at gmail.com
Sat Aug 12 08:23:11 UTC 2023
Am 11.08.23 um 08:02 schrieb Lijo Lazar:
> Presently, there are multiple clients of reset like RAS, job timeout, KFD hang
> detection and debug method. Instead of each client maintaining a work item,
> reset work pool is moved to reset domain. When a client makes a recovery request,
> a work item is allocated by the reset domain and queued for execution. For the
> case of job timeout, each ring has its own TDR queue to which tdr work is
> scheduled. From there, it's further queued to a reset domain based on the device
> configuration.
>
> This allows flexibility to have multiple reset domains. For example, when
> there are partitions, each partition can maintain its own reset domain and a job
> timeout on one partition doesn't affect jobs on the other partition (when the
> jobs don't have any interdependency). The reset logic will select the
> appropriate reset domain based on the current device configuration.
Well completely NAK to that design.
We intentionally added the workqueue to serialize *all* reset work and I
absolutely don't see any reason to change that.
Regards,
Christian.
>
> Lijo Lazar (5):
> drm/amdgpu: Add work pool to reset domain
> drm/amdgpu: Move to reset_schedule_work
> drm/amdgpu: Set flags to cancel all pending resets
> drm/amdgpu: Add API to queue and do reset work
> drm/amdgpu: Add TDR queue for ring
>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 32 +++---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 +---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 40 +++----
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 16 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 71 ++++++------
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 122 ++++++++++++++++++++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 32 +++++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 5 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 1 -
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 38 +++----
> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 44 ++++----
> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 33 +++---
> 15 files changed, 285 insertions(+), 177 deletions(-)
>
More information about the amd-gfx
mailing list