[PATCH] drm/amdgpu: Fix mutex lock from atomic context.

Wed Sep 11 14:19:02 UTC 2019

I like this much more, I will relocate to amdgpu_umc_process_ras_data_cb 
an push.

Andrey

On 9/10/19 11:08 PM, Zhou1, Tao wrote:
> amdgpu_ras_reserve_bad_pages is only used by umc block, so another approach is to move it into amdgpu_umc_process_ras_data_cb.
> Anyway, either way is OK and the patch is:
>
> Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>
>
>> -----Original Message-----
>> From: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> Sent: 2019年9月11日 3:41
>> To: amd-gfx at lists.freedesktop.org
>> Cc: Chen, Guchun <Guchun.Chen at amd.com>; Zhou1, Tao
>> <Tao.Zhou1 at amd.com>; Deucher, Alexander
>> <Alexander.Deucher at amd.com>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky at amd.com>
>> Subject: [PATCH] drm/amdgpu: Fix mutex lock from atomic context.
>>
>> Problem:
>> amdgpu_ras_reserve_bad_pages was moved to amdgpu_ras_reset_gpu
>> because writing to EEPROM during ASIC reset was unstable.
>> But for ERREVENT_ATHUB_INTERRUPT amdgpu_ras_reset_gpu is called
>> directly from ISR context and so locking is not allowed. Also it's irrelevant for
>> this partilcular interrupt as this is generic RAS interrupt and not memory
>> errors specific.
>>
>> Fix:
>> Avoid calling amdgpu_ras_reserve_bad_pages if not in task context.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
>> index 012034d..dd5da3c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
>> @@ -504,7 +504,9 @@ static inline int amdgpu_ras_reset_gpu(struct
>> amdgpu_device *adev,
>>   	/* save bad page to eeprom before gpu reset,
>>   	 * i2c may be unstable in gpu reset
>>   	 */
>> -	amdgpu_ras_reserve_bad_pages(adev);
>> +	if (in_task())
>> +		amdgpu_ras_reserve_bad_pages(adev);
>> +
>>   	if (atomic_cmpxchg(&ras->in_recovery, 0, 1) == 0)
>>   		schedule_work(&ras->recovery_work);
>>   	return 0;
>> --
>> 2.7.4