[PATCH] drm/amdgpu: support new mode-1 reset interface
Lazar, Lijo
lijo.lazar at amd.com
Tue Nov 16 08:56:36 UTC 2021
On 11/16/2021 2:17 PM, Zhou1, Tao wrote:
> [AMD Official Use Only]
>
> Hi Lijo,
>
> Your concern is reasonable, but in fact smu_v13_0_mode1_reset is used only by ALDEBARAN currently. I assume the PMFW of new smu v13 ASIC in the future will follow this design, otherwise we could move the implementation into xxx_ppt.c.
>
Actually, this is meant to be a common logic for SMU13 based ASICs. The
version check in a common file is not maintainable. I see there is a
version check before also, even that is not proper :)
It is better to do it properly when support is added rather than
thinking of refactoring with future ASICs.
Thanks,
Lijo
> Regards,
> Tao
>
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>> Sent: Tuesday, November 16, 2021 3:44 PM
>> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Zhang,
>> Hawking <Hawking.Zhang at amd.com>; Clements, John
>> <John.Clements at amd.com>; Yang, Stanley <Stanley.Yang at amd.com>; Quan,
>> Evan <Evan.Quan at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: support new mode-1 reset interface
>>
>>
>>
>> On 11/16/2021 12:53 PM, Tao Zhou wrote:
>>> If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
>>> reset message.
>>>
>>> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
>>> ---
>>> .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 21
>> ++++++++++++++++---
>>> 1 file changed, 18 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
>>> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
>>> index 35145db6eedf..6f3d064a8232 100644
>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
>>> @@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct
>>> smu_context *smu)
>>>
>>> int smu_v13_0_mode1_reset(struct smu_context *smu)
>>> {
>>> - u32 smu_version;
>>> + u32 smu_version, fatal_err, param;
>>> int ret = 0;
>>> + struct amdgpu_device *adev = smu->adev;
>>> + struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>>> +
>>> + fatal_err = 0;
>>> + param = SMU_RESET_MODE_1;
>>> +
>>> /*
>>> * PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
>>> */
>>> smu_cmn_get_smc_version(smu, NULL, &smu_version);
>>> if (smu_version < 0x00440700)
>>> ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset,
>> NULL);
>>> - else
>>> - ret = smu_cmn_send_smc_msg_with_param(smu,
>> SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL);
>>> + else {
>>> + /* fatal error triggered by ras, PMFW supports the flag
>>> + from 68.44.0 */
>>> + if ((smu_version >= 0x00442c00) && ras &&
>>> + atomic_read(&ras->in_recovery))
>>> + fatal_err = 1;
>>> +
>>
>> From PMFW version, this looks specific to aldebaran. Since there is version
>> check as well, the implementation needs to be moved to aldebaran_ppt.c
>>
>> Thanks,
>> Lijo
>>
>>> + param |= (fatal_err << 16);
>>> + ret = smu_cmn_send_smc_msg_with_param(smu,
>>> + SMU_MSG_GfxDeviceDriverReset,
>> param, NULL);
>>> + }
>>>
>>> if (!ret)
>>> msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
>>>
More information about the amd-gfx
mailing list