[PATCH v4 3/8] drm/amdgpu: Fix SMU error failure

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Thu Sep 3 15:29:17 UTC 2020


On 9/2/20 6:05 PM, Bjorn Helgaas wrote:
> On Wed, Sep 02, 2020 at 02:42:05PM -0400, Andrey Grodzovsky wrote:
>> Wait for HW/PSP initiated ASIC reset to complete before
>> starting the recovery operations.
>>
>> v2: Remove typo
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 ++++++++++++++++++++--
>>   1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e999f1f..412d07e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4838,14 +4838,32 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev)
>>   {
>>   	struct drm_device *dev = pci_get_drvdata(pdev);
>>   	struct amdgpu_device *adev = drm_to_adev(dev);
>> -	int r;
>> +	int r, i;
>>   	bool vram_lost;
>> +	u32 memsize;
>>   
>>   	DRM_INFO("PCI error: slot reset callback!!\n");
>>   
>> +	/* wait for asic to come out of reset */
> I know it's totally OCD, but it is a minor speed bump to read "asic"
> here and "ASIC" in the commit log above and the new comment below.
>
>> +	msleep(500);
>> +
>>   	pci_restore_state(pdev);
>>   
>> -	adev->in_pci_err_recovery = true;
>> +	/* confirm  ASIC came out of reset */
>> +	for (i = 0; i < adev->usec_timeout; i++) {
>> +		memsize = amdgpu_asic_get_config_memsize(adev);
>> +
>> +		if (memsize != 0xffffffff)
> I guess this is a spot where you actually depend on an MMIO read
> returning 0xffffffff because adev->in_pci_err_recovery is false at
> this point, so amdgpu_mm_rreg() will actually *do* the MMIO read
> instead of returning 0.  Right?


Yes, i picked this form another place where they use it to confirm PCI confspace and
the BARs specifically are restored and active and so you can read back sane MMIO 
regs values.

Andrey


>
>> +			break;
>> +		udelay(1);
>> +	}
>> +	if (memsize == 0xffffffff) {
>> +		r = -ETIME;
>> +		goto out;
>> +	}
>> +
>> +	/* TODO Call amdgpu_pre_asic_reset instead */
>> +	adev->in_pci_err_recovery = true;	
>>   	r = amdgpu_device_ip_suspend(adev);
>>   	adev->in_pci_err_recovery = false;
>>   	if (r)
>> -- 
>> 2.7.4
>>


More information about the amd-gfx mailing list