[PATCH v3] drm/amdgpu: Cancel delayed work when GFXOFF is disabled

Mon Aug 16 12:06:12 UTC 2021

Am 16.08.21 um 13:33 schrieb Lazar, Lijo:
> On 8/16/2021 4:05 PM, Michel Dänzer wrote:
>> From: Michel Dänzer <mdaenzer at redhat.com>
>>
>> schedule_delayed_work does not push back the work if it was already
>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
>> was disabled and re-enabled again during those 100 ms.
>>
>> This resulted in frame drops / stutter with the upcoming mutter 41
>> release on Navi 14, due to constantly enabling GFXOFF in the HW and
>> disabling it again (for getting the GPU clock counter).
>>
>> To fix this, call cancel_delayed_work_sync when the disable count
>> transitions from 0 to 1, and only schedule the delayed work on the
>> reverse transition, not if the disable count was already 0. This makes
>> sure the delayed work doesn't run at unexpected times, and allows it to
>> be lock-free.
>>
>> v2:
>> * Use cancel_delayed_work_sync & mutex_trylock instead of
>>    mod_delayed_work.
>> v3:
>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
>>
>> Cc: stable at vger.kernel.org
>> Signed-off-by: Michel Dänzer <mdaenzer at redhat.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-----
>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index f3fd5ec710b6..f944ed858f3e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2777,12 +2777,11 @@ static void 
>> amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>>       struct amdgpu_device *adev =
>>           container_of(work, struct amdgpu_device, 
>> gfx.gfx_off_delay_work.work);
>>   -    mutex_lock(&adev->gfx.gfx_off_mutex);
>> -    if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>> -        if (!amdgpu_dpm_set_powergating_by_smu(adev, 
>> AMD_IP_BLOCK_TYPE_GFX, true))
>> -            adev->gfx.gfx_off_state = true;
>> -    }
>> -    mutex_unlock(&adev->gfx.gfx_off_mutex);
>> +    WARN_ON_ONCE(adev->gfx.gfx_off_state);
>
> Don't see any case for this. It's not expected to be scheduled in this 
> case, right?
>
>> + WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
>> +
>
> Thinking about ON_ONCE here - this may happen more than once if it's 
> completed as part of cancel_ call. Is the warning needed?

WARN_ON_ONCE() is usually used to prevent spamming the system log with 
warnings. E.g. the warning is only printed once indicating a driver bug 
and that's it.

>
> Anyway,
>     Reviewed-by: Lijo Lazar <lijo.lazar at amd.com>

Acked-by: Christian König <christian.koenig at amd.com>

Regards,
Christian.

>
>> +    if (!amdgpu_dpm_set_powergating_by_smu(adev, 
>> AMD_IP_BLOCK_TYPE_GFX, true))
>> +        adev->gfx.gfx_off_state = true;
>>   }
>>     /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index a0be0772c8b3..ca91aafcb32b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -563,15 +563,26 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device 
>> *adev, bool enable)
>>         mutex_lock(&adev->gfx.gfx_off_mutex);
>>   -    if (!enable)
>> -        adev->gfx.gfx_off_req_count++;
>> -    else if (adev->gfx.gfx_off_req_count > 0)
>> +    if (enable) {
>> +        /* If the count is already 0, it means there's an imbalance 
>> bug somewhere.
>> +         * Note that the bug may be in a different caller than the 
>> one which triggers the
>> +         * WARN_ON_ONCE.
>> +         */
>> +        if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
>> +            goto unlock;
>> +
>>           adev->gfx.gfx_off_req_count--;
>> +    } else {
>> +        adev->gfx.gfx_off_req_count++;
>> +    }
>>         if (enable && !adev->gfx.gfx_off_state && 
>> !adev->gfx.gfx_off_req_count) {
>> schedule_delayed_work(&adev->gfx.gfx_off_delay_work, 
>> GFX_OFF_DELAY_ENABLE);
>> -    } else if (!enable && adev->gfx.gfx_off_state) {
>> -        if (!amdgpu_dpm_set_powergating_by_smu(adev, 
>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>> +    } else if (!enable && adev->gfx.gfx_off_req_count == 1) {
>> + cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
>> +
>> +        if (adev->gfx.gfx_off_state &&
>> +            !amdgpu_dpm_set_powergating_by_smu(adev, 
>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>>               adev->gfx.gfx_off_state = false;
>>                 if (adev->gfx.funcs->init_spm_golden) {
>> @@ -581,6 +592,7 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device 
>> *adev, bool enable)
>>           }
>>       }
>>   +unlock:
>>       mutex_unlock(&adev->gfx.gfx_off_mutex);
>>   }
>>