[PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr
Felix Kuehling
felix.kuehling at amd.com
Fri Oct 18 18:28:23 UTC 2024
On 2024-10-17 04:34, Victor Zhao wrote:
> make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
> called, to avoid qcm fence timeout caused by incorrect ordering.
>
> Signed-off-by: Victor Zhao <Victor.Zhao at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 +
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index b2b16a812e73..d9264a353775 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -2254,6 +2254,7 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
> goto out;
>
> *dqm->fence_addr = KFD_FENCE_INIT;
> + mb();
> pm_send_query_status(&dqm->packet_mgr, dqm->fence_gpu_addr,
> KFD_FENCE_COMPLETED);
> /* should be timed out */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index 09ab36f8e8c6..bddb169bb301 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -260,7 +260,7 @@ struct device_queue_manager {
> uint16_t vmid_pasid[VMID_NUM];
> uint64_t pipelines_addr;
> uint64_t fence_gpu_addr;
> - uint64_t *fence_addr;
> + volatile uint64_t *fence_addr;
[+Christian]
Is the volatile keyword really needed here? I just saw other patches
removing volatile in some places because it's not sufficient, and not
needed if you use memory barriers correctly.
Regards,
Felix
> struct kfd_mem_obj *fence_mem;
> bool active_runlist;
> int sched_policy;
More information about the amd-gfx
mailing list