[PATCH] drm/amdgpu: Move reset domain locking in DPC handler
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Thu Apr 14 14:40:50 UTC 2022
My bad, I see u already fixed this in amd-staging-drm-next. We had an
issue in an internal branch with this and just reinvented the wheel :))
Andrey
On 2022-04-14 10:32, Andrey Grodzovsky wrote:
> Yea, i need to improve it a bit, ignore this one, will be back with V2.
>
> Andrey
>
> On 2022-04-14 03:12, Chen, Guchun wrote:
>> It's in amdgpu_pci_resume.
>>
>> Andrey, shall we modify the code accordingly in amdgpu_pci_resume as
>> well? Otherwise, an unset/unlock leak will happen when
>> pci_channel_state != pci_channel_io_frozen.
>>
>> Regards,
>> Guchun
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>> Christian König
>> Sent: Thursday, April 14, 2022 2:40 PM
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>;
>> amd-gfx at lists.freedesktop.org
>> Cc: Antonovitch, Anatoli <Anatoli.Antonovitch at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: Move reset domain locking in DPC
>> handler
>>
>>
>>
>> Am 13.04.22 um 21:31 schrieb Andrey Grodzovsky:
>>> Lock reset domain unconditionally because on resume we unlock it
>>> unconditionally.
>>> This solved mutex deadlock when handling both FATAL and non FATAL PCI
>>> errors one after another.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++-------
>>> 1 file changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 1cc488a767d8..c65f25e3a0fc 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -5531,18 +5531,18 @@ pci_ers_result_t
>>> amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
>>> adev->pci_channel_state = state;
>>> + /*
>>> + * Locking adev->reset_domain->sem will prevent any external
>>> access
>>> + * to GPU during PCI error recovery
>>> + */
>>> + amdgpu_device_lock_reset_domain(adev->reset_domain);
>>> + amdgpu_device_set_mp1_state(adev);
>>> +
>>> switch (state) {
>>> case pci_channel_io_normal:
>>> return PCI_ERS_RESULT_CAN_RECOVER;
>> BTW: Where are we unlocking that again?
>>
>>> /* Fatal error, prepare for slot reset */
>>> case pci_channel_io_frozen:
>>> - /*
>>> - * Locking adev->reset_domain->sem will prevent any
>>> external access
>>> - * to GPU during PCI error recovery
>>> - */
>>> - amdgpu_device_lock_reset_domain(adev->reset_domain);
>>> - amdgpu_device_set_mp1_state(adev);
>>> -
>>> /*
>>> * Block any work scheduling as we do for regular GPU reset
>>> * for the duration of the recovery
More information about the amd-gfx
mailing list