[PATCH] drm/amdgpu: disable cstate properly for driver reloading scenario
Lazar, Lijo
lijo.lazar at amd.com
Thu Mar 2 09:20:47 UTC 2023
On 3/2/2023 2:43 PM, Quan, Evan wrote:
> [AMD Official Use Only - General]
>
>
>
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>> Sent: Thursday, March 2, 2023 4:28 PM
>> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: disable cstate properly for driver
>> reloading scenario
>>
>>
>>
>> On 3/2/2023 12:28 PM, Evan Quan wrote:
>>> Gpu reset might be needed during driver reloading. To guard that(gpu
>>> reset) work, df cstate needs to be disabled properly.
>>>
>>> Signed-off-by: Evan Quan <evan.quan at amd.com>
>>> Change-Id: I5c074c265c0b08a67b6934ae1ad9aa3fed245461
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++++
>>> 1 file changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 51bbeaa1f311..3c854461ef32 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2816,6 +2816,15 @@ static int amdgpu_device_ip_fini_early(struct
>> amdgpu_device *adev)
>>> amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
>>> amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
>>>
>>> + /*
>>> + * Get df cstate disabled properly on driver unloading.
>>> + * Since on the succeeding driver reloading, gpu reset might
>>> + * be required. And cstate disabled is a prerequisite for
>>> + * that(gpu reset).
>>> + */
>>> + if (amdgpu_dpm_set_df_cstate(adev, DF_CSTATE_DISALLOW))
>>> + dev_warn(adev->dev, "Failed to disallow df cstate");
>>> +
>>
>> This looks more like a firmware bug. Driver sends the Unload message to FW.
>> In that case FW should disable all features including C-state.
> Driver does not send the Unload message. We want PMFM alive and ready for handling possible gpu reset on reloading.
>
Actually, soc21_need_reset_on_init code itself has a bug. PSP won't get
unloaded by default on ring destruction. Even if PSP stops, it could
just keep the heartbeat value as non-zero (just that it won't increment).
Probably, that needs to be fixed first rather than keeping PMFW alive
for a reset.
Thanks,
Lijo
> BR
> Evan
>>
>> Thanks,
>> Lijo
>>
>>> amdgpu_amdkfd_suspend(adev, false);
>>>
>>> /* Workaroud for ASICs need to disable SMC first */
More information about the amd-gfx
mailing list