[PATCH v3] drm/amdgpu: Cancel delayed work when GFXOFF is disabled

Lazar, Lijo lijo.lazar at amd.com
Tue Aug 17 08:17:08 UTC 2021



On 8/17/2021 1:21 PM, Quan, Evan wrote:
> [AMD Official Use Only]
> 
> 
> 
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>> Michel Dänzer
>> Sent: Monday, August 16, 2021 6:35 PM
>> To: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian
>> <Christian.Koenig at amd.com>
>> Cc: Liu, Leo <Leo.Liu at amd.com>; Zhu, James <James.Zhu at amd.com>; amd-
>> gfx at lists.freedesktop.org; dri-devel at lists.freedesktop.org
>> Subject: [PATCH v3] drm/amdgpu: Cancel delayed work when GFXOFF is
>> disabled
>>
>> From: Michel Dänzer <mdaenzer at redhat.com>
>>
>> schedule_delayed_work does not push back the work if it was already
>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
>> was disabled and re-enabled again during those 100 ms.
>>
>> This resulted in frame drops / stutter with the upcoming mutter 41
>> release on Navi 14, due to constantly enabling GFXOFF in the HW and
>> disabling it again (for getting the GPU clock counter).
>>
>> To fix this, call cancel_delayed_work_sync when the disable count
>> transitions from 0 to 1, and only schedule the delayed work on the
>> reverse transition, not if the disable count was already 0. This makes
>> sure the delayed work doesn't run at unexpected times, and allows it to
>> be lock-free.
>>
>> v2:
>> * Use cancel_delayed_work_sync & mutex_trylock instead of
>>    mod_delayed_work.
>> v3:
>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
>>
>> Cc: stable at vger.kernel.org
>> Signed-off-by: Michel Dänzer <mdaenzer at redhat.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-
>> ----
>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index f3fd5ec710b6..f944ed858f3e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2777,12 +2777,11 @@ static void
>> amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>>   	struct amdgpu_device *adev =
>>   		container_of(work, struct amdgpu_device,
>> gfx.gfx_off_delay_work.work);
>>
>> -	mutex_lock(&adev->gfx.gfx_off_mutex);
>> -	if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>> -		if (!amdgpu_dpm_set_powergating_by_smu(adev,
>> AMD_IP_BLOCK_TYPE_GFX, true))
>> -			adev->gfx.gfx_off_state = true;
>> -	}
>> -	mutex_unlock(&adev->gfx.gfx_off_mutex);
>> +	WARN_ON_ONCE(adev->gfx.gfx_off_state);
>> +	WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
>> +
>> +	if (!amdgpu_dpm_set_powergating_by_smu(adev,
>> AMD_IP_BLOCK_TYPE_GFX, true))
>> +		adev->gfx.gfx_off_state = true;
>>   }
>>
>>   /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index a0be0772c8b3..ca91aafcb32b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -563,15 +563,26 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device
>> *adev, bool enable)
>>
>>   	mutex_lock(&adev->gfx.gfx_off_mutex);
>>
>> -	if (!enable)
>> -		adev->gfx.gfx_off_req_count++;
>> -	else if (adev->gfx.gfx_off_req_count > 0)
>> +	if (enable) {
>> +		/* If the count is already 0, it means there's an imbalance bug
>> somewhere.
>> +		 * Note that the bug may be in a different caller than the one
>> which triggers the
>> +		 * WARN_ON_ONCE.
>> +		 */
>> +		if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
>> +			goto unlock;
>> +
>>   		adev->gfx.gfx_off_req_count--;
>> +	} else {
>> +		adev->gfx.gfx_off_req_count++;
>> +	}
>>
>>   	if (enable && !adev->gfx.gfx_off_state && !adev-
>>> gfx.gfx_off_req_count) {
>>   		schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
>> GFX_OFF_DELAY_ENABLE);
>> -	} else if (!enable && adev->gfx.gfx_off_state) {
>> -		if (!amdgpu_dpm_set_powergating_by_smu(adev,
>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>> +	} else if (!enable && adev->gfx.gfx_off_req_count == 1) {
> [Quan, Evan] It seems here will leave a small time window for race condition. If amdgpu_device_delay_enable_gfx_off() happens to occur here, it will "WARN_ON_ONCE(adev->gfx.gfx_off_req_count);". How about something as below?
> @@ -573,13 +573,11 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>                          goto unlock;
> 
>                  adev->gfx.gfx_off_req_count--;
> -       } else {
> -               adev->gfx.gfx_off_req_count++;
>          }
> 
>          if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>                  schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
> -       } else if (!enable && adev->gfx.gfx_off_req_count == 1) {
> +       } else if (!enable && adev->gfx.gfx_off_req_count == 0) {
>                  cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
> 
>                  if (adev->gfx.gfx_off_state &&
> @@ -593,6 +591,9 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>                  }
>          }
> 
> +       if (!enable)
> +               adev->gfx.gfx_off_req_count++;
> +
>   unlock:
> 

Hi Evan,

It's not a race per se, it is just an undesirable condition of Enable 
Gfxoff immediately followed by a Disable GfxOff. The purpose of the WARN 
is to intimate the user about it.

There are other cases - for ex: if amdgpu_device_delay_enable_gfx_off() 
called amdgpu_dpm_set_powergating_by_smu() already at the same place you 
pointed out. In this case WARN doesn't get printed, but it's not an 
optimal situation either. Probably it makes sense to move the WARN_ON as 
the last line of amdgpu_device_delay_enable_gfx_off. Either way, I don't 
think it's a race condition.

Thanks,
Lijo


> BR
> Evan
>> +		cancel_delayed_work_sync(&adev-
>>> gfx.gfx_off_delay_work);
>> +
>> +		if (adev->gfx.gfx_off_state &&
>> +		    !amdgpu_dpm_set_powergating_by_smu(adev,
>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>>   			adev->gfx.gfx_off_state = false;
>>
>>   			if (adev->gfx.funcs->init_spm_golden) {
>> @@ -581,6 +592,7 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device
>> *adev, bool enable)
>>   		}
>>   	}
>>
>> +unlock:
>>   	mutex_unlock(&adev->gfx.gfx_off_mutex);
>>   }
>>
>> --
>> 2.32.0


More information about the dri-devel mailing list