[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled
Lazar, Lijo
lijo.lazar at amd.com
Tue Aug 17 10:37:30 UTC 2021
On 8/17/2021 3:29 PM, Michel Dänzer wrote:
> On 2021-08-17 11:37 a.m., Lazar, Lijo wrote:
>>
>>
>> On 8/17/2021 2:56 PM, Michel Dänzer wrote:
>>> On 2021-08-17 11:12 a.m., Lazar, Lijo wrote:
>>>>
>>>>
>>>> On 8/17/2021 1:53 PM, Michel Dänzer wrote:
>>>>> From: Michel Dänzer <mdaenzer at redhat.com>
>>>>>
>>>>> schedule_delayed_work does not push back the work if it was already
>>>>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
>>>>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
>>>>> was disabled and re-enabled again during those 100 ms.
>>>>>
>>>>> This resulted in frame drops / stutter with the upcoming mutter 41
>>>>> release on Navi 14, due to constantly enabling GFXOFF in the HW and
>>>>> disabling it again (for getting the GPU clock counter).
>>>>>
>>>>> To fix this, call cancel_delayed_work_sync when the disable count
>>>>> transitions from 0 to 1, and only schedule the delayed work on the
>>>>> reverse transition, not if the disable count was already 0. This makes
>>>>> sure the delayed work doesn't run at unexpected times, and allows it to
>>>>> be lock-free.
>>>>>
>>>>> v2:
>>>>> * Use cancel_delayed_work_sync & mutex_trylock instead of
>>>>> mod_delayed_work.
>>>>> v3:
>>>>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
>>>>> v4:
>>>>> * Fix race condition between amdgpu_gfx_off_ctrl incrementing
>>>>> adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
>>>>> checking for it to be 0 (Evan Quan)
>>>>>
>>>>> Cc: stable at vger.kernel.org
>>>>> Reviewed-by: Lijo Lazar <lijo.lazar at amd.com> # v3
>>>>> Acked-by: Christian König <christian.koenig at amd.com> # v3
>>>>> Signed-off-by: Michel Dänzer <mdaenzer at redhat.com>
>>>>> ---
>>>>>
>>>>> Alex, probably best to wait a bit longer before picking this up. :)
>>>>>
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++----
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 36 +++++++++++++++-------
>>>>> 2 files changed, 30 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index f3fd5ec710b6..f944ed858f3e 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -2777,12 +2777,11 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>>>>> struct amdgpu_device *adev =
>>>>> container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work);
>>>>> - mutex_lock(&adev->gfx.gfx_off_mutex);
>>>>> - if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>>>>> - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
>>>>> - adev->gfx.gfx_off_state = true;
>>>>> - }
>>>>> - mutex_unlock(&adev->gfx.gfx_off_mutex);
>>>>> + WARN_ON_ONCE(adev->gfx.gfx_off_state);
>>>>> + WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
>>>>> +
>>>>> + if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
>>>>> + adev->gfx.gfx_off_state = true;
>>>>> }
>>>>> /**
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>>>> index a0be0772c8b3..b4ced45301be 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>>>> @@ -563,24 +563,38 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>>>>> mutex_lock(&adev->gfx.gfx_off_mutex);
>>>>> - if (!enable)
>>>>> - adev->gfx.gfx_off_req_count++;
>>>>> - else if (adev->gfx.gfx_off_req_count > 0)
>>>>> + if (enable) {
>>>>> + /* If the count is already 0, it means there's an imbalance bug somewhere.
>>>>> + * Note that the bug may be in a different caller than the one which triggers the
>>>>> + * WARN_ON_ONCE.
>>>>> + */
>>>>> + if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
>>>>> + goto unlock;
>>>>> +
>>>>> adev->gfx.gfx_off_req_count--;
>>>>> - if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>>>>> - schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
>>>>> - } else if (!enable && adev->gfx.gfx_off_state) {
>>>>> - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) {
>>>>> - adev->gfx.gfx_off_state = false;
>>>>> + if (adev->gfx.gfx_off_req_count == 0 && !adev->gfx.gfx_off_state)
>>>>> + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
>>>>> + } else {
>>>>> + if (adev->gfx.gfx_off_req_count == 0) {
>>>>> + cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
>>>>> +
>>>>> + if (adev->gfx.gfx_off_state &&
>>>>
>>>> More of a question which I didn't check last time - Is this expected to be true when the disable call comes in first?
>>>
>>> My assumption is that cancel_delayed_work_sync guarantees amdgpu_device_delay_enable_gfx_off's assignment is visible here.
>>>
>>
>> To clarify - when nothing is scheduled. If enable() is called when the count is 0, it goes to unlock. Now the expectation is someone to call Disable first.
>
> Yes, the very first amdgpu_gfx_off_ctrl call must pass enable=false, or it's a bug, which
>
> if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
>
> will catch.
>
>
>> Let's say Disable() is called first, then the variable will be false, right?
>
> Ohh, I see what you mean. The first time amdgpu_gfx_off_ctrl is called with enable=false, adev->gfx.gfx_off_state == false (what it was initialized to), so it doesn't actually disable GFXOFF in HW.
Exactly.
>
> Note that this is a separate pre-existing bug, not a regression of my patch.
>
> I wonder what's the best solution for that, move the adev->gfx.gfx_off_state assignments into amdgpu_dpm_set_powergating_by_smu?
Should be an existing one, never bothered about that condition before.
One hack would be
is_pending = cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
if ((adev->gfx.gfx_off_state || !is_pending) &&
If work was never scheduled or pending, is_pending should be false OR if
it got executed, gfx_off_state should be set.
Thanks,
Lijo
>
>
More information about the dri-devel
mailing list