[PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized
Kim, Jonathan
Jonathan.Kim at amd.com
Wed Nov 6 03:49:06 UTC 2019
No problem. With the incoming patch referenced below, this patch should be ok since it doesn't break the build.
Reviewed-by: Jonathan Kim <Jonathan.Kim at amd.com>
From: Quan, Evan <Evan.Quan at amd.com>
Sent: Tuesday, November 5, 2019 10:11 PM
To: Kim, Jonathan <Jonathan.Kim at amd.com>; Strawbridge, Michael <Michael.Strawbridge at amd.com>; amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized
Thanks Jon.
Per discussed in another mail thread, this should be applied after the patch below
https://lists.freedesktop.org/archives/amd-gfx/2019-November/042160.html
Sorry for missing this important information.
Regards,
Evan
From: Kim, Jonathan <Jonathan.Kim at amd.com<mailto:Jonathan.Kim at amd.com>>
Sent: Wednesday, November 6, 2019 10:18 AM
To: Strawbridge, Michael <Michael.Strawbridge at amd.com<mailto:Michael.Strawbridge at amd.com>>; Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized
Taking a look at this again, it's not an indexing issue, it's a placement problem. I don't think this solution will work if we expect to pstate switch all gpus.
>From amdgpu_kms.c, amdgpu_register_gpu_instance comes after amdgpu_device_init. This means we'll never reach mgpu_info.num_gpu == adev->gmc.xgmi.num_physical_nodes until in this code path.
From: Kim, Jonathan
Sent: Tuesday, November 5, 2019 12:07 PM
To: Strawbridge, Michael <Michael.Strawbridge at amd.com<mailto:Michael.Strawbridge at amd.com>>; Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized
+ for (i = 0; i < mgpu_info.num_gpu; i++) {
Also while num_physical_nodes should be used here instead. It doesn't make sense to have a pre-condition check i.e. (if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes - 1) ) against the total number of nodes while the loop checks for the current num gpus from mgpu_info. This will fail to set the last node to low pstate for the same 0-indexing issue once the pre-condition passes.
+ gpu_instance = &(mgpu_info.gpu_ins[i]);
+ if (gpu_instance->adev->flags & AMD_IS_APU)
+ continue;
+
From: Kim, Jonathan
Sent: Tuesday, November 5, 2019 11:42 AM
To: Strawbridge, Michael <Michael.Strawbridge at amd.com<mailto:Michael.Strawbridge at amd.com>>; Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized
Yes that's correct. This should fix the issue.
Jon
From: Strawbridge, Michael <Michael.Strawbridge at amd.com<mailto:Michael.Strawbridge at amd.com>>
Sent: Tuesday, November 5, 2019 11:40 AM
To: Kim, Jonathan <Jonathan.Kim at amd.com<mailto:Jonathan.Kim at amd.com>>; Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized
Hi Jon,
Does that mean we can simply use this instead?
+ if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes - 1) {
Thanks,
Michael
________________________________
From: Kim, Jonathan <Jonathan.Kim at amd.com<mailto:Jonathan.Kim at amd.com>>
Sent: 05 November 2019 11:32 AM
To: Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>>
Cc: Strawbridge, Michael <Michael.Strawbridge at amd.com<mailto:Michael.Strawbridge at amd.com>>
Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized
Please see inline.
Jon
-----Original Message-----
From: Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>
Sent: Tuesday, November 5, 2019 5:24 AM
To: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Cc: Kim, Jonathan <Jonathan.Kim at amd.com<mailto:Jonathan.Kim at amd.com>>; Strawbridge, Michael <Michael.Strawbridge at amd.com<mailto:Michael.Strawbridge at amd.com>>; Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>
Subject: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized
P-state switch should be performed after all devices from the hive get initialized.
Change-Id: Ifc7cac9ef0cf250447d2a412da35d601e2ac79ec
Signed-off-by: Evan Quan <evan.quan at amd.com<mailto:evan.quan at amd.com>>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 47 ++++++++++++++++------
1 file changed, 35 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e6ce949670e5..2d72d206cead 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2090,6 +2090,7 @@ static int amdgpu_device_enable_mgpu_fan_boost(void)
*/
static int amdgpu_device_ip_late_init(struct amdgpu_device *adev) {
+ struct amdgpu_gpu_instance *gpu_instance;
int i = 0, r;
for (i = 0; i < adev->num_ip_blocks; i++) { @@ -2115,6 +2116,40 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
+
+ if (adev->gmc.xgmi.num_physical_nodes > 1) {
+ mutex_lock(&mgpu_info.mutex);
+
+ /*
+ * Reset device p-state to low as this was booted with high.
+ *
+ * This should be performed only after all devices from the same
+ * hive get initialized.
+ *
+ * However, it's unknown how many device in the hive in advance.
+ * As this is counted one by one during devices initializations.
+ *
+ * So, we wait for all XGMI interlinked devices initialized.
+ * This may bring some delays as those devices may come from
+ * different hives. But that should be OK.
+ */
+ if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes) {
[JK] This condition will never succeed. mgpu_info.num_dgpu is 0-indexed while num_physical_nodes give total nodes.
+ for (i = 0; i < mgpu_info.num_gpu; i++) {
+ gpu_instance = &(mgpu_info.gpu_ins[i]);
+ if (gpu_instance->adev->flags & AMD_IS_APU)
+ continue;
+
+ r = amdgpu_xgmi_set_pstate(gpu_instance->adev, 0);
+ if (r) {
+ DRM_ERROR("pstate setting failed (%d).\n", r);
+ break;
+ }
+ }
+ }
+
+ mutex_unlock(&mgpu_info.mutex);
+ }
+
return 0;
}
@@ -2227,18 +2262,6 @@ static void amdgpu_device_delayed_init_work_handler(struct work_struct *work)
r = amdgpu_ib_ring_tests(adev);
if (r)
DRM_ERROR("ib ring test failed (%d).\n", r);
-
- /*
- * set to low pstate by default
- * This should be performed after all devices from
- * XGMI finish their initializations. Thus it's moved
- * to here.
- * The time delay is 2S. TODO: confirm whether that
- * is enough for all possible XGMI setups.
- */
- r = amdgpu_xgmi_set_pstate(adev, 0);
- if (r)
- DRM_ERROR("pstate setting failed (%d).\n", r);
}
static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
--
2.23.0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191106/5484b9ee/attachment-0001.html>
More information about the amd-gfx
mailing list