[PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

Deng, Emily Emily.Deng at amd.com
Tue Aug 4 08:01:29 UTC 2020


[AMD Official Use Only - Internal Distribution Only]

>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Liu,
>Monk
>Sent: Tuesday, August 4, 2020 2:31 PM
>To: amd-gfx at lists.freedesktop.org
>Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>[AMD Official Use Only - Internal Distribution Only]
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping ... this is a severe bug fix
>
>_____________________________________
>Monk Liu|GPU Virtualization Team |AMD
>
>
>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Liu,
>Monk
>Sent: Monday, August 3, 2020 9:55 AM
>To: Kuehling, Felix <Felix.Kuehling at amd.com>; amd-gfx at lists.freedesktop.org
>Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>[AMD Official Use Only - Internal Distribution Only]
>
>[AMD Official Use Only - Internal Distribution Only]
>
>>>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the
>>>right place to stop the KIQ
>
>KIQ (CPC) will never being stopped (the DISABLE on CPC is skipped for SRIOV )
>for SRIOV in SW_FINI because SRIOV relies on KIQ to do world switch
>
>But this is really a weird bug because even with the same approach it doesn't
>make KIQ (CP) hang on GFX9, only GFX10 need this patch ....
>
>By now I do not have other good explanation or better fix than this one
>
>_____________________________________
>Monk Liu|GPU Virtualization Team |AMD
>
>
>-----Original Message-----
>From: Kuehling, Felix <Felix.Kuehling at amd.com>
>Sent: Friday, July 31, 2020 9:57 PM
>To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
>Subject: Re: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right place
>to stop the KIQ? Otherwise KIQ will hang as soon as someone allocates the
>memory that was previously used for the KIQ ring buffer and overwrites it with
>something that's not a valid PM4 packet.
>
>Regards,
>  Felix
>
>Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
>> KIQ will hang if we try below steps:
>> modprobe amdgpu
>> rmmod amdgpu
>> modprobe amdgpu sched_hw_submission=4
>>
>> the cause is that due to KIQ is always living there even after we
>> unload KMD thus when doing the realod of KMD KIQ will crash upon its
>> register programed with different values with the previous
>> configuration (the config like HQD addr, ring size, is easily changed
>> if we alter the sched_hw_submission)
>>
>> the fix is we must inactive KIQ first before touching any of its
>> registgers
>>
>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index db9f1e8..f571e25 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct
>> amdgpu_ring *ring)  struct v10_compute_mqd *mqd = ring->mqd_ptr;  int
>> j;
>>
>> +/* activate the queue */
>> +WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
>> +
Could we move follow to here?
if (RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1) {
WREG32_SOC15(GC, 0, mmCP_HQD_DEQUEUE_REQUEST, 1);
for (j = 0; j < adev->usec_timeout; j++) {
if (!(RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1))
break;
udelay(1);
}
>>  /* disable wptr polling */
>>  WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx at lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&data=02%7C01%7CEmily.Deng%40amd.com%7C1236f42617d246b20
>bc108d8384007e4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637321194957236933&sdata=0%2BzHvJ1n4TZOYss4v1pR6i8bxq46JE73
>UIi%2B49x9joU%3D&reserved=0
>_______________________________________________
>amd-gfx mailing list
>amd-gfx at lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&data=02%7C01%7CEmily.Deng%40amd.com%7C1236f42617d246b20
>bc108d8384007e4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637321194957236933&sdata=0%2BzHvJ1n4TZOYss4v1pR6i8bxq46JE73
>UIi%2B49x9joU%3D&reserved=0


More information about the amd-gfx mailing list