[PATCH] drm/amdgpu: Move reset domain locking in DPC handler

Andrey Grodzovsky andrey.grodzovsky at amd.com
Thu Apr 14 14:32:21 UTC 2022


Yea, i need to improve it a bit, ignore this one, will be back with V2.

Andrey

On 2022-04-14 03:12, Chen, Guchun wrote:
> It's in amdgpu_pci_resume.
>
> Andrey, shall we modify the code accordingly in amdgpu_pci_resume as well? Otherwise, an unset/unlock leak will happen when pci_channel_state != pci_channel_io_frozen.
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Christian König
> Sent: Thursday, April 14, 2022 2:40 PM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Antonovitch, Anatoli <Anatoli.Antonovitch at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Move reset domain locking in DPC handler
>
>
>
> Am 13.04.22 um 21:31 schrieb Andrey Grodzovsky:
>> Lock reset domain unconditionally because on resume we unlock it
>> unconditionally.
>> This solved mutex deadlock when handling both FATAL and non FATAL PCI
>> errors one after another.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++-------
>>    1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 1cc488a767d8..c65f25e3a0fc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5531,18 +5531,18 @@ pci_ers_result_t
>> amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
>>    
>>    	adev->pci_channel_state = state;
>>    
>> +	/*
>> +	 * Locking adev->reset_domain->sem will prevent any external access
>> +	 * to GPU during PCI error recovery
>> +	 */
>> +	amdgpu_device_lock_reset_domain(adev->reset_domain);
>> +	amdgpu_device_set_mp1_state(adev);
>> +
>>    	switch (state) {
>>    	case pci_channel_io_normal:
>>    		return PCI_ERS_RESULT_CAN_RECOVER;
> BTW: Where are we unlocking that again?
>
>>    	/* Fatal error, prepare for slot reset */
>>    	case pci_channel_io_frozen:
>> -		/*
>> -		 * Locking adev->reset_domain->sem will prevent any external access
>> -		 * to GPU during PCI error recovery
>> -		 */
>> -		amdgpu_device_lock_reset_domain(adev->reset_domain);
>> -		amdgpu_device_set_mp1_state(adev);
>> -
>>    		/*
>>    		 * Block any work scheduling as we do for regular GPU reset
>>    		 * for the duration of the recovery


More information about the amd-gfx mailing list