[RESEND PATCH 4/5] Subject: drm/amdgpu: Redo XGMI reset synchronization.
Ma, Le
Le.Ma at amd.com
Thu Dec 12 04:05:01 UTC 2019
[AMD Official Use Only - Internal Distribution Only]
-----Original Message-----
From: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
Sent: Thursday, December 12, 2019 4:39 AM
To: dri-devel at lists.freedesktop.org; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Ma, Le <Le.Ma at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Quan, Evan <Evan.Quan at amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>
Subject: [RESEND PATCH 4/5] Subject: drm/amdgpu: Redo XGMI reset synchronization.
Use task barrier in XGMI hive to synchronize ASIC resets across devices in XGMI hive.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com<mailto:andrey.grodzovsky at amd.com>>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 42 +++++++++++++++++++++++++-----
1 file changed, 36 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1d19edfa..e4089a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -67,6 +67,7 @@
#include "amdgpu_tmz.h"
#include <linux/suspend.h>
+#include <drm/task_barrier.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
@@ -2663,14 +2664,43 @@ static void amdgpu_device_xgmi_reset_func(struct work_struct *__work) {
struct amdgpu_device *adev =
container_of(__work, struct amdgpu_device, xgmi_reset_work);
+ struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, 0);
- if (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO)
- adev->asic_reset_res = (adev->in_baco == false) ?
- amdgpu_device_baco_enter(adev->ddev) :
- qamdgpu_device_baco_exit(adev->ddev);
- else
- adev->asic_reset_res = amdgpu_asic_reset(adev);
+ /*
+ * Use task barrier to synchronize all xgmi reset works across the
+ * hive.
+ * task_barrier_enter and task_barrier_exit will block untill all the
+ * threads running the xgmi reset works reach those points. I assume
+ * guarantee of progress here for all the threads as the workqueue code
+ * creates new worker threads as needed by amount of work items in queue
+ * (see worker_thread) and also each thread sleeps in the barrir and by
+ * this yielding the CPU for other work threads to make progress.
+ */
[Le]: This comments can be adjusted since we switch to system_unbound_wq in patch #5.
+ if (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
+
+ if (hive)
+ task_barrier_enter(&hive->tb);
[Le]: The multiple hive condition can be checked only once and moved to the location right after the assignment.
+
+ adev->asic_reset_res = amdgpu_device_baco_enter(adev->ddev);
+
+ if (adev->asic_reset_res)
+ goto fail;
+
+ if (hive)
+ task_barrier_exit(&hive->tb);
[Le]: Same as above.
+
+ adev->asic_reset_res = amdgpu_device_baco_exit(adev->ddev);
+
+ if (adev->asic_reset_res)
+ goto fail;
+ } else {
+ if (hive)
+ task_barrier_full(&hive->tb);
[Le]: Same as above.
With above addressed, Reviewed-by: Le Ma <Le.Ma at amd.com<mailto:Le.Ma at amd.com>>
Regards,
Ma Le
+
+ adev->asic_reset_res = amdgpu_asic_reset(adev);
+ }
+fail:
if (adev->asic_reset_res)
DRM_WARN("ASIC reset failed with error, %d for drm dev, %s",
adev->asic_reset_res, adev->ddev->unique);
--
2.7.4
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191212/78916e60/attachment.htm>
More information about the amd-gfx
mailing list