[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled
Lazar, Lijo
lijo.lazar at amd.com
Fri Aug 13 11:50:06 UTC 2021
On 8/13/2021 3:59 PM, Michel Dänzer wrote:
> From: Michel Dänzer <mdaenzer at redhat.com>
>
> schedule_delayed_work does not push back the work if it was already
> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
> was disabled and re-enabled again during those 100 ms.
>
> This resulted in frame drops / stutter with the upcoming mutter 41
> release on Navi 14, due to constantly enabling GFXOFF in the HW and
> disabling it again (for getting the GPU clock counter).
>
> To fix this, call cancel_delayed_work_sync when GFXOFF transitions from
> enabled to disabled. This makes sure the delayed work will be scheduled
> as intended in the reverse case.
>
> In order to avoid a deadlock, amdgpu_device_delay_enable_gfx_off needs
> to use mutex_trylock instead of mutex_lock.
>
> v2:
> * Use cancel_delayed_work_sync & mutex_trylock instead of
> mod_delayed_work.
>
> Signed-off-by: Michel Dänzer <mdaenzer at redhat.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++++++++++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 13 +++++++------
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 +++
> 3 files changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f3fd5ec710b6..8b025f70706c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2777,7 +2777,16 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
> struct amdgpu_device *adev =
> container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work);
>
> - mutex_lock(&adev->gfx.gfx_off_mutex);
> + /* mutex_lock could deadlock with cancel_delayed_work_sync in amdgpu_gfx_off_ctrl. */
> + if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) {
> + /* If there's a bug which causes amdgpu_gfx_off_ctrl to be called with enable=true
> + * when adev->gfx.gfx_off_req_count is already 0, we might race with that.
> + * Re-schedule to make sure gfx off will be re-enabled in the HW eventually.
> + */
> + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, AMDGPU_GFX_OFF_DELAY_ENABLE);
> + return;
This is not needed and is just creating another thread to contend for
mutex. The checks below take care of enabling gfxoff correctly. If it's
already in gfx_off state, it doesn't do anything. So I don't see why
this change is needed.
The other problem is amdgpu_get_gfx_off_status() also uses the same
mutex. So it won't be knowing which thread it would be contending
against and blindly creates more work items.
> + }
> +
> if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
> if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
> adev->gfx.gfx_off_state = true;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index a0be0772c8b3..da4c46db3093 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -28,9 +28,6 @@
> #include "amdgpu_rlc.h"
> #include "amdgpu_ras.h"
>
> -/* delay 0.1 second to enable gfx off feature */
> -#define GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100)
> -
> /*
> * GPU GFX IP block helpers function.
> */
> @@ -569,9 +566,13 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
> adev->gfx.gfx_off_req_count--;
>
> if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
> - schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
> - } else if (!enable && adev->gfx.gfx_off_state) {
> - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) {
> + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, AMDGPU_GFX_OFF_DELAY_ENABLE);
> + } else if (!enable) {
> + if (adev->gfx.gfx_off_req_count == 1 && !adev->gfx.gfx_off_state)
> + cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
This has the deadlock problem as discussed in the other thread.
Thanks,
Lijo
> + if (adev->gfx.gfx_off_state &&
> + !amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) {
> adev->gfx.gfx_off_state = false;
>
> if (adev->gfx.funcs->init_spm_golden) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index d43fe2ed8116..dcdb505bb7f4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -32,6 +32,9 @@
> #include "amdgpu_rlc.h"
> #include "soc15.h"
>
> +/* delay 0.1 second to enable gfx off feature */
> +#define AMDGPU_GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100)
> +
> /* GFX current status */
> #define AMDGPU_GFX_NORMAL_MODE 0x00000000L
> #define AMDGPU_GFX_SAFE_MODE 0x00000001L
>
More information about the amd-gfx
mailing list