[PATCH] drm/amdkfd: Fix eviction fence handling

Ba, Gang Gang.Ba at amd.com
Thu Apr 18 17:23:47 UTC 2024


[AMD Official Use Only - General]

Tested-by: Gang BA <Gang.Ba at amd.com>
Reviewed-by: Gang BA <Gang.Ba at amd.com>
________________________________
From: Kuehling, Felix <Felix.Kuehling at amd.com>
Sent: Wednesday, April 17, 2024 11:14 PM
To: amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Cc: Ba, Gang <Gang.Ba at amd.com>; Prosyak, Vitaly <Vitaly.Prosyak at amd.com>
Subject: [PATCH] drm/amdkfd: Fix eviction fence handling

Handle case that dma_fence_get_rcu_safe returns NULL.

If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling <felix.kuehling at amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index b79986412cd8..aafdf064651f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1922,6 +1922,8 @@ static int signal_eviction_fence(struct kfd_process *p)
         rcu_read_lock();
         ef = dma_fence_get_rcu_safe(&p->ef);
         rcu_read_unlock();
+       if (!ef)
+               return -EINVAL;

         ret = dma_fence_signal(ef);
         dma_fence_put(ef);
@@ -1949,10 +1951,9 @@ static void evict_process_worker(struct work_struct *work)
                  * they are responsible stopping the queues and scheduling
                  * the restore work.
                  */
-               if (!signal_eviction_fence(p))
-                       queue_delayed_work(kfd_restore_wq, &p->restore_work,
-                               msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
-               else
+               if (signal_eviction_fence(p) ||
+                   mod_delayed_work(kfd_restore_wq, &p->restore_work,
+                                    msecs_to_jiffies(PROCESS_RESTORE_TIME_MS)))
                         kfd_process_restore_queues(p);

                 pr_debug("Finished evicting pasid 0x%x\n", p->pasid);
--
2.34.1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240418/7d06e0b4/attachment.htm>


More information about the amd-gfx mailing list