<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2024-10-21 04:12, Christian König
wrote:<br>
</div>
<blockquote type="cite" cite="mid:efb4b082-d518-46b6-adda-776458772e1c@gmail.com">Am
18.10.24 um 23:59 schrieb Philip Yang:
<br>
<blockquote type="cite">On 2024-10-18 14:28, Felix Kuehling wrote:
<br>
<blockquote type="cite">
<br>
On 2024-10-17 04:34, Victor Zhao wrote:
<br>
<blockquote type="cite">make sure KFD_FENCE_INIT write to
fence_addr before pm_send_query_status
<br>
called, to avoid qcm fence timeout caused by incorrect
ordering.
<br>
<br>
Signed-off-by: Victor Zhao <a class="moz-txt-link-rfc2396E" href="mailto:Victor.Zhao@amd.com"><Victor.Zhao@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1
+
<br>
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2
+-
<br>
2 files changed, 2 insertions(+), 1 deletion(-)
<br>
<br>
diff --git
a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
<br>
index b2b16a812e73..d9264a353775 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
<br>
@@ -2254,6 +2254,7 @@ static int unmap_queues_cpsch(struct
device_queue_manager *dqm,
<br>
goto out;
<br>
*dqm->fence_addr = KFD_FENCE_INIT;
<br>
+ mb();
<br>
pm_send_query_status(&dqm->packet_mgr,
dqm->fence_gpu_addr,
<br>
KFD_FENCE_COMPLETED);
<br>
/* should be timed out */
<br>
diff --git
a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
<br>
index 09ab36f8e8c6..bddb169bb301 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
<br>
@@ -260,7 +260,7 @@ struct device_queue_manager {
<br>
uint16_t vmid_pasid[VMID_NUM];
<br>
uint64_t pipelines_addr;
<br>
uint64_t fence_gpu_addr;
<br>
- uint64_t *fence_addr;
<br>
+ volatile uint64_t *fence_addr;
<br>
</blockquote>
<br>
[+Christian]
<br>
<br>
Is the volatile keyword really needed here? I just saw other
patches removing volatile in some places because it's not
sufficient, and not needed if you use memory barriers
correctly.
<br>
</blockquote>
<br>
After reading kernel memory barrier document and below link, I
think we need both volatile type and memory barrier, to
guarantee F/W get the updated fence value. This fixes an CP hang
issue on SRIOV.
<br>
<br>
<a class="moz-txt-link-freetext" href="https://stackoverflow.com/questions/75750110/volatile-vs-memory-barriers#:~:text=volatile%20will%20make%20sure%20that,not%20reorder%20writes%20or%20reads">https://stackoverflow.com/questions/75750110/volatile-vs-memory-barriers#:~:text=volatile%20will%20make%20sure%20that,not%20reorder%20writes%20or%20reads</a>.
<br>
<br>
</blockquote>
<br>
No, that isn't correct. Using volatile is considered harmful and
almost never correct, see here
<a class="moz-txt-link-freetext" href="https://docs.kernel.org/process/volatile-considered-harmful.html">https://docs.kernel.org/process/volatile-considered-harmful.html</a>
<br>
<br>
Placing appropriate memory barriers must be sufficient or
otherwise there is a rather bad platform or compiler bug lurking
around.
<br>
</blockquote>
<p>Yes, Victor confirmed that memory barrier fixes the issue, will
send new patch to remove the volatile type.</p>
<p>Regards,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:efb4b082-d518-46b6-adda-776458772e1c@gmail.com">
<br>
Regards,
<br>
Christian.
<br>
<br>
<blockquote type="cite">Regards,
<br>
<br>
Philip
<br>
<br>
<blockquote type="cite">
<br>
Regards,
<br>
Felix
<br>
<br>
<br>
<blockquote type="cite"> struct kfd_mem_obj
*fence_mem;
<br>
bool active_runlist;
<br>
int sched_policy;
<br>
</blockquote>
</blockquote>
</blockquote>
<br>
</blockquote>
</body>
</html>