[PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr
Zhao, Victor
Victor.Zhao at amd.com
Fri Oct 18 05:31:06 UTC 2024
[AMD Official Use Only - AMD Internal Distribution Only]
Ping. Please help review.
Thanks,
Victor
-----Original Message-----
From: Victor Zhao <Victor.Zhao at amd.com>
Sent: Thursday, October 17, 2024 4:35 PM
To: amd-gfx at lists.freedesktop.org
Cc: Zhao, Victor <Victor.Zhao at amd.com>
Subject: [PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status called, to avoid qcm fence timeout caused by incorrect ordering.
Signed-off-by: Victor Zhao <Victor.Zhao at amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b2b16a812e73..d9264a353775 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -2254,6 +2254,7 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
goto out;
*dqm->fence_addr = KFD_FENCE_INIT;
+ mb();
pm_send_query_status(&dqm->packet_mgr, dqm->fence_gpu_addr,
KFD_FENCE_COMPLETED);
/* should be timed out */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 09ab36f8e8c6..bddb169bb301 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -260,7 +260,7 @@ struct device_queue_manager {
uint16_t vmid_pasid[VMID_NUM];
uint64_t pipelines_addr;
uint64_t fence_gpu_addr;
- uint64_t *fence_addr;
+ volatile uint64_t *fence_addr;
struct kfd_mem_obj *fence_mem;
bool active_runlist;
int sched_policy;
--
2.34.1
More information about the amd-gfx
mailing list