[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

Liu, Shaoyun Shaoyun.Liu at amd.com
Fri Dec 17 17:06:40 UTC 2021


[AMD Official Use Only]

Ok, sounds reasonable . I'm ok for the function name change .  
Another concern , from driver side , before it start the  ip init ,  it will check the SMU clock to determine whether the  asic need a reset from driver side . For your case , the hypervisor will trigger the SBR on  VM on/off and SMU will handle the reset.  Can  you check after this  reset , will SMU still alive ? If it's alive , the driver will trigger the reset again . 

Regards
Shaoyun.liu

-----Original Message-----
From: Saye, Sashank <Sashank.Saye at amd.com> 
Sent: Friday, December 17, 2021 11:53 AM
To: Liu, Shaoyun <Shaoyun.Liu at amd.com>; amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

[AMD Official Use Only]

Hi Shaoyun,
Yes, From SMU FW point of view they do see a difference between Bare metal and passthrough case for SBR. For baremetal they get it as a PCI reset whereas passthrough case they get it as a BIF reset. Now within BIF reset they would need to differentiate between older asic( where we do BACO) and newer ones where we do mode 1 reset. Hence in-order for SMU to differentiate these scenarios we are adding a new message. 

I think I will rename the function to smu_handle_passthrough_sbr from the current smu_set_light_sbr function name.

Regards
Sashank

-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu at amd.com>
Sent: Friday, December 17, 2021 11:45 AM
To: Saye, Sashank <Sashank.Saye at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Saye, Sashank <Sashank.Saye at amd.com>
Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

[AMD Official Use Only]

First , the name of heavy SBR  is confusing when you need to go through  light SBR code path. 
Secondary,  originally we introduce the light SBR is because on older asic,   FW can not synchronize the reset on the devices within the hive, so it depends on driver to sync the reset.  From what I have heard , for chip aructus , the FW actually can sync the reset itself.  I don't see a necessary to  introduce the heavy SBR message, it seems SMU will do a full reset  when it get SBR  request.  IS there  a different code path  for SMU to handle the reset  for XGMI in passthrough mode ?  

Regards
Shaoyun.liu

-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of sashank saye
Sent: Friday, December 17, 2021 10:33 AM
To: amd-gfx at lists.freedesktop.org
Cc: Saye, Sashank <Sashank.Saye at amd.com>
Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is used for SMU to differentiate.

Signed-off-by: sashank saye <sashank.saye at amd.com>
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c         |  4 ++--
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h       |  4 +++-
 drivers/gpu/drm/amd/pm/inc/smu_types.h             |  3 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++++++++++
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..06aee23505b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,8 +2618,8 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)
 	if (r)
 		DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-	/* For XGMI + passthrough configuration on arcturus, enable light SBR */
-	if (adev->asic_type == CHIP_ARCTURUS &&
+	/* For XGMI + passthrough configuration on arcturus and aldebaran, enable light SBR */
+	if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
+CHIP_ALDEBARAN ) &&
 	    amdgpu_passthrough(adev) &&
 	    adev->gmc.xgmi.num_physical_nodes > 1)
 		smu_set_light_sbr(&adev->smu, true);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery	0x42
 #define PPSMC_MSG_BoardPowerCalibration 	0x43
-#define PPSMC_Message_Count			0x44
+#define PPSMC_MSG_HeavySBR                      0x45
+#define PPSMC_Message_Count			0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET              0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
 	__SMU_DUMMY_MAP(BoardPowerCalibration),   \
 	__SMU_DUMMY_MAP(RequestGfxclk),           \
 	__SMU_DUMMY_MAP(ForceGfxVid),             \
-	__SMU_DUMMY_MAP(UnforceGfxVid),
+	__SMU_DUMMY_MAP(UnforceGfxVid),           \
+	__SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)	SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 7433a051e795..f442950e9676 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -141,6 +141,7 @@ static const struct cmn2asic_msg_mapping aldebaran_message_map[SMU_MSG_MAX_COUNT
 	MSG_MAP(SetUclkDpmMode,			     PPSMC_MSG_SetUclkDpmMode,			0),
 	MSG_MAP(GfxDriverResetRecovery,		     PPSMC_MSG_GfxDriverResetRecovery,		0),
 	MSG_MAP(BoardPowerCalibration,		     PPSMC_MSG_BoardPowerCalibration,		0),
+	MSG_MAP(HeavySBR,                            PPSMC_MSG_HeavySBR,                        0),
 };
 
 static const struct cmn2asic_mapping aldebaran_clk_map[SMU_CLK_COUNT] = { @@ -1912,6 +1913,15 @@ static int aldebaran_mode2_reset(struct smu_context *smu)
 	return ret;
 }
 
+static int aldebaran_set_light_sbr(struct smu_context *smu, bool
+enable) {
+	int ret = 0;
+	//For alderbarn chip, SMU would do a mode 1 reset as part of SBR hence we call it HeavySBR instead of light
+	ret =  smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_HeavySBR, enable ? 
+1 : 0, NULL);
+
+	return ret;
+}
+
 static bool aldebaran_is_mode1_reset_supported(struct smu_context *smu)  {  #if 0 @@ -2021,6 +2031,7 @@ static const struct pptable_funcs aldebaran_ppt_funcs = {
 	.get_gpu_metrics = aldebaran_get_gpu_metrics,
 	.mode1_reset_is_support = aldebaran_is_mode1_reset_supported,
 	.mode2_reset_is_support = aldebaran_is_mode2_reset_supported,
+	.set_light_sbr = aldebaran_set_light_sbr,
 	.mode1_reset = aldebaran_mode1_reset,
 	.set_mp1_state = aldebaran_set_mp1_state,
 	.mode2_reset = aldebaran_mode2_reset,
--
2.25.1


More information about the amd-gfx mailing list