[PATCH v3] drm/radeon: Fix EEH during kexec
Kyle Mahlkuch
kmahlkuc at linux.vnet.ibm.com
Thu Oct 31 15:24:53 UTC 2019
On 10/30/19 5:35 AM, Michael Ellerman wrote:
> Hi Kyle,
>
> KyleMahlkuch <kmahlkuc at linux.vnet.ibm.com> writes:
>> From: Kyle Mahlkuch <kmahlkuc at linux.vnet.ibm.com>
>>
>> During kexec some adapters hit an EEH since they are not properly
>> shut down in the radeon_pci_shutdown() function. Adding
>> radeon_suspend_kms() fixes this issue.
>> Enabled only on PPC because this patch causes issues on some other
>> boards.
> Which adapters hit the issues?
>
> And do we know why they're not shut down correctly in
> radeon_pci_shutdown()? That seems like the root cause no?
Hi Michael,
This is hit by the Caicos (edwards2) adapter that I have on ppc. It is not hit
on the Cedar (FirePro) adapter - though I haven't tested this one recently. I'm
not able to test any other adapters. As far as "why", I'm unsure. During
initialization after the kexec we hit an EEH. There could be another point in
the shutdown / start up process where something doesn't get reset correctly.
I'm open to other ideas if you have any.
>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
>> index 9e55076..4528f4d 100644
>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>> @@ -379,11 +379,25 @@ static int radeon_pci_probe(struct pci_dev *pdev,
>> static void
>> radeon_pci_shutdown(struct pci_dev *pdev)
>> {
>> +#ifdef CONFIG_PPC64
>> + struct drm_device *ddev = pci_get_drvdata(pdev);
>> +#endif
> This local serves no real purpose and could be avoided, which would also
> avoid this ifdef.
>
>> /* if we are running in a VM, make sure the device
>> * torn down properly on reboot/shutdown
>> */
>> if (radeon_device_is_virtual())
>> radeon_pci_remove(pdev);
>> +
>> +#ifdef CONFIG_PPC64
>> + /* Some adapters need to be suspended before a
> AFAIK drm uses normal kernel comment style, so this should be:
>
> /*
> * Some adapters need to be suspended before a
>> + * shutdown occurs in order to prevent an error
>> + * during kexec.
>> + * Make this power specific becauase it breaks
>> + * some non-power boards.
>> + */
>> + radeon_suspend_kms(ddev, true, true, false);
> ie, instead do:
>
> radeon_suspend_kms(pci_get_drvdata(pdev), true, true, false);
I agree, this is a cleaner way to write this patch. I'll update the comment as
well. Thanks for the help.
>> +#endif
>> }
>>
>> static int radeon_pmops_suspend(struct device *dev)
>> --
>> 1.8.3.1
> cheers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191031/285f3417/attachment-0001.html>
More information about the amd-gfx
mailing list