[PATCH] drm/amdgpu/gfx11: need acquire mutex before access CP_VMID_RESET

Xiao, Jack Jack.Xiao at amd.com
Fri Dec 22 03:04:05 UTC 2023


[AMD Official Use Only - General]

I can see the soft reset can be called from amdgpu_device_gpu_recover and pci error handler, they have called amdgpu_device_lock_reset_domain to acquire a reset lock before soft reset.

Regards,
Jack

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com>
Sent: Wednesday, December 20, 2023 10:06 PM
To: Xiao, Jack <Jack.Xiao at amd.com>; Alex Deucher <alexdeucher at gmail.com>
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: Re: [PATCH] drm/amdgpu/gfx11: need acquire mutex before access CP_VMID_RESET

Well not the reset lock, but there should only be a single reset queue which this runs on.

Regards,
Christian.

Am 20.12.23 um 10:49 schrieb Xiao, Jack:
> [AMD Official Use Only - General]
>
> It's already protected by the reset lock. In my understanding, soft reset should not run in parallel.
>
> Regards,
> Jack
>
> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Wednesday, December 20, 2023 1:04 AM
> To: Xiao, Jack <Jack.Xiao at amd.com>
> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
> Subject: Re: [PATCH] drm/amdgpu/gfx11: need acquire mutex before
> access CP_VMID_RESET
>
> On Tue, Dec 19, 2023 at 4:30 AM Jack Xiao <Jack.Xiao at amd.com> wrote:
>> It's required to take the gfx mutex before access to CP_VMID_RESET,
>> for there is a race condition with CP firmware to write the register.
>>
>> Signed-off-by: Jack Xiao <Jack.Xiao at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index bdcf96df69e6..ae3370d34d11 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -4518,6 +4518,22 @@ static int gfx_v11_0_soft_reset(void *handle)
>>                  }
>>          }
>>
> We probably want a CPU mutex or spinlock to protect this as well unless this is already protected by the reset lock.
>
> Alex
>
>> +       /* Try to require the gfx mutex before access to CP_VMID_RESET */
>> +       for (i = 0; i < adev->usec_timeout; i++) {
>> +               /* Request with MeId=2, PipeId=0 */
>> +               tmp = REG_SET_FIELD(0, CP_GFX_INDEX_MUTEX, REQUEST, 1);
>> +               tmp = REG_SET_FIELD(tmp, CP_GFX_INDEX_MUTEX, CLIENTID, 4);
>> +               WREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX, tmp);
>> +               if (RREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX) == tmp)
>> +                       break;
>> +               udelay(1);
>> +       }
>> +
>> +       if (i >= adev->usec_timeout) {
>> +               printk("Failed to require the gfx mutex during soft reset\n");
>> +               return -EINVAL;
>> +       }
>> +
>>          WREG32_SOC15(GC, 0, regCP_VMID_RESET, 0xfffffffe);
>>
>>          // Read CP_VMID_RESET register three times.
>> @@ -4526,6 +4542,10 @@ static int gfx_v11_0_soft_reset(void *handle)
>>          RREG32_SOC15(GC, 0, regCP_VMID_RESET);
>>          RREG32_SOC15(GC, 0, regCP_VMID_RESET);
>>
>> +       /* release the gfx mutex */
>> +       tmp = REG_SET_FIELD(tmp, CP_GFX_INDEX_MUTEX, REQUEST, 0);
>> +       WREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX, tmp);
>> +
>>          for (i = 0; i < adev->usec_timeout; i++) {
>>                  if (!RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) &&
>>                      !RREG32_SOC15(GC, 0, regCP_GFX_HQD_ACTIVE))
>> --
>> 2.41.0
>>



More information about the amd-gfx mailing list