[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled

Quan, Evan Evan.Quan at amd.com
Mon Aug 16 10:20:19 UTC 2021


[AMD Official Use Only]

Hi Michel,

The patch seems reasonable to me(especially the cancel_delayed_work_sync() part).
However, can you explain more about the code below?
What's the race issue here exactly?

+	/* mutex_lock could deadlock with cancel_delayed_work_sync in amdgpu_gfx_off_ctrl. */
+	if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) {
+		/* If there's a bug which causes amdgpu_gfx_off_ctrl to be called with enable=true
+		 * when adev->gfx.gfx_off_req_count is already 0, we might race with that.
+		 * Re-schedule to make sure gfx off will be re-enabled in the HW eventually.
+		 */
+		schedule_delayed_work(&adev->gfx.gfx_off_delay_work, AMDGPU_GFX_OFF_DELAY_ENABLE);
+		return;
+	}

BR
Evan
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> Michel Dänzer
> Sent: Friday, August 13, 2021 6:29 PM
> To: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com>
> Cc: Liu, Leo <Leo.Liu at amd.com>; Zhu, James <James.Zhu at amd.com>; amd-
> gfx at lists.freedesktop.org; dri-devel at lists.freedesktop.org
> Subject: [PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is
> disabled
> 
> From: Michel Dänzer <mdaenzer at redhat.com>
> 
> schedule_delayed_work does not push back the work if it was already
> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
> was disabled and re-enabled again during those 100 ms.
> 
> This resulted in frame drops / stutter with the upcoming mutter 41
> release on Navi 14, due to constantly enabling GFXOFF in the HW and
> disabling it again (for getting the GPU clock counter).
> 
> To fix this, call cancel_delayed_work_sync when GFXOFF transitions from
> enabled to disabled. This makes sure the delayed work will be scheduled
> as intended in the reverse case.
> 
> In order to avoid a deadlock, amdgpu_device_delay_enable_gfx_off needs
> to use mutex_trylock instead of mutex_lock.
> 
> v2:
> * Use cancel_delayed_work_sync & mutex_trylock instead of
>   mod_delayed_work.
> 
> Signed-off-by: Michel Dänzer <mdaenzer at redhat.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 13 +++++++------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h    |  3 +++
>  3 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f3fd5ec710b6..8b025f70706c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2777,7 +2777,16 @@ static void
> amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>  	struct amdgpu_device *adev =
>  		container_of(work, struct amdgpu_device,
> gfx.gfx_off_delay_work.work);
> 
> -	mutex_lock(&adev->gfx.gfx_off_mutex);
> +	/* mutex_lock could deadlock with cancel_delayed_work_sync in
> amdgpu_gfx_off_ctrl. */
> +	if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) {
> +		/* If there's a bug which causes amdgpu_gfx_off_ctrl to be
> called with enable=true
> +		 * when adev->gfx.gfx_off_req_count is already 0, we might
> race with that.
> +		 * Re-schedule to make sure gfx off will be re-enabled in the
> HW eventually.
> +		 */
> +		schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
> AMDGPU_GFX_OFF_DELAY_ENABLE);
> +		return;
> +	}
> +
>  	if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>  		if (!amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_GFX, true))
>  			adev->gfx.gfx_off_state = true;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index a0be0772c8b3..da4c46db3093 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -28,9 +28,6 @@
>  #include "amdgpu_rlc.h"
>  #include "amdgpu_ras.h"
> 
> -/* delay 0.1 second to enable gfx off feature */
> -#define GFX_OFF_DELAY_ENABLE         msecs_to_jiffies(100)
> -
>  /*
>   * GPU GFX IP block helpers function.
>   */
> @@ -569,9 +566,13 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device
> *adev, bool enable)
>  		adev->gfx.gfx_off_req_count--;
> 
>  	if (enable && !adev->gfx.gfx_off_state && !adev-
> >gfx.gfx_off_req_count) {
> -		schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
> GFX_OFF_DELAY_ENABLE);
> -	} else if (!enable && adev->gfx.gfx_off_state) {
> -		if (!amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_GFX, false)) {
> +		schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
> AMDGPU_GFX_OFF_DELAY_ENABLE);
> +	} else if (!enable) {
> +		if (adev->gfx.gfx_off_req_count == 1 && !adev-
> >gfx.gfx_off_state)
> +			cancel_delayed_work_sync(&adev-
> >gfx.gfx_off_delay_work);
> +
> +		if (adev->gfx.gfx_off_state &&
> +		    !amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_GFX, false)) {
>  			adev->gfx.gfx_off_state = false;
> 
>  			if (adev->gfx.funcs->init_spm_golden) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index d43fe2ed8116..dcdb505bb7f4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -32,6 +32,9 @@
>  #include "amdgpu_rlc.h"
>  #include "soc15.h"
> 
> +/* delay 0.1 second to enable gfx off feature */
> +#define AMDGPU_GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100)
> +
>  /* GFX current status */
>  #define AMDGPU_GFX_NORMAL_MODE			0x00000000L
>  #define AMDGPU_GFX_SAFE_MODE			0x00000001L
> --
> 2.32.0


More information about the amd-gfx mailing list