[PATCH] drm/amdgpu: Fix mutex lock from atomic context.
Grodzovsky, Andrey
Andrey.Grodzovsky at amd.com
Wed Sep 11 14:41:12 UTC 2019
On second though this will break what about reserving bad pages when
resetting GPU for non RAS error reason such as manual reset ,S3 or ring
timeout, (amdgpu_ras_resume->amdgpu_ras_reset_gpu) so i will keep the
code as is.
Another possible issue in existing code - looks like no reservation will
take place in those case even now as amdgpu_ras_reserve_bad_pages
data->last_reserved will be equal to data->count , no ? Looks like for
this case you need to add flag to FORCE reservation for all pages from
0 to data->counnt.
Andrey
On 9/11/19 10:19 AM, Andrey Grodzovsky wrote:
> I like this much more, I will relocate to
> amdgpu_umc_process_ras_data_cb an push.
>
> Andrey
>
> On 9/10/19 11:08 PM, Zhou1, Tao wrote:
>> amdgpu_ras_reserve_bad_pages is only used by umc block, so another
>> approach is to move it into amdgpu_umc_process_ras_data_cb.
>> Anyway, either way is OK and the patch is:
>>
>> Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>
>>
>>> -----Original Message-----
>>> From: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> Sent: 2019年9月11日 3:41
>>> To: amd-gfx at lists.freedesktop.org
>>> Cc: Chen, Guchun <Guchun.Chen at amd.com>; Zhou1, Tao
>>> <Tao.Zhou1 at amd.com>; Deucher, Alexander
>>> <Alexander.Deucher at amd.com>; Grodzovsky, Andrey
>>> <Andrey.Grodzovsky at amd.com>
>>> Subject: [PATCH] drm/amdgpu: Fix mutex lock from atomic context.
>>>
>>> Problem:
>>> amdgpu_ras_reserve_bad_pages was moved to amdgpu_ras_reset_gpu
>>> because writing to EEPROM during ASIC reset was unstable.
>>> But for ERREVENT_ATHUB_INTERRUPT amdgpu_ras_reset_gpu is called
>>> directly from ISR context and so locking is not allowed. Also it's
>>> irrelevant for
>>> this partilcular interrupt as this is generic RAS interrupt and not
>>> memory
>>> errors specific.
>>>
>>> Fix:
>>> Avoid calling amdgpu_ras_reserve_bad_pages if not in task context.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
>>> index 012034d..dd5da3c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
>>> @@ -504,7 +504,9 @@ static inline int amdgpu_ras_reset_gpu(struct
>>> amdgpu_device *adev,
>>> /* save bad page to eeprom before gpu reset,
>>> * i2c may be unstable in gpu reset
>>> */
>>> - amdgpu_ras_reserve_bad_pages(adev);
>>> + if (in_task())
>>> + amdgpu_ras_reserve_bad_pages(adev);
>>> +
>>> if (atomic_cmpxchg(&ras->in_recovery, 0, 1) == 0)
>>> schedule_work(&ras->recovery_work);
>>> return 0;
>>> --
>>> 2.7.4
More information about the amd-gfx
mailing list