[PATCH] drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO

Mario Limonciello superm1 at kernel.org
Wed Dec 11 21:43:27 UTC 2024


On 12/11/2024 15:41, Alex Deucher wrote:
> On Wed, Dec 11, 2024 at 3:19 PM Mario Limonciello <superm1 at kernel.org> wrote:
>>
>> On 12/11/2024 14:08, Alex Deucher wrote:
>>> On Wed, Dec 11, 2024 at 10:56 AM Mario Limonciello <superm1 at kernel.org> wrote:
>>>>
>>>> From: Mario Limonciello <mario.limonciello at amd.com>
>>>>
>>>> If the kernel hasn't been compiled with PCIe hotplug support this
>>>> can lead to problems with dGPUs that use BOCO because they effectively
>>>> drop off the bus.
>>>>
>>>> To prevent issues, disable BOCO support when compiled without PCIe hotplug.
>>>>
>>>> Reported-by: Gabriel Marcano <gabemarcano at yahoo.com>
>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862
>>>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>>>
>>> Acked-by: Alex Deucher <alexander.deucher at amd.com>
>>
>> Thx.
>>
>>>
>>> Seems like this should affect any device which supports d3cold.  Maybe
>>> we want something more general as well?
>>
>> Any specific ideas?  One of these two hunks I think make sense, leaning
>> upon the second one more strongly.
> 
> Actually, I wonder if the affected hardware pre-dates d3cold and uses
> the old proprietary AMD ATPX interface to control dGPU power.  In that
> case, the d3cold is managed by the driver rather than the PCI/ACPI
> subsystems.  IIRC, there was a workaround in the PCIe hotplug code to
> avoid calling the pci remove function when the driver powered down the
> GPU via ATPX (or the nvidia equivalent).  If so, this check should go
> in amdgpu_device_supports_px() instead.

Gabriel,

Can you please share a full kernel log so we can clarify which method 
your hardware uses?

Thanks,
> 
> Alex
> 
>>
>>
>>
>>                                     diff --git a/drivers/pci/pci.c
>> b/drivers/pci/pci.c
>> index 0b29ec6e8e5e2..01691f1d26fbe 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -2751,6 +2751,10 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>>           if (target_state == PCI_POWER_ERROR)
>>                   return -EIO;
>>
>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
>> +           target_state == PCI_D3cold)
>> +               return -EBUSY;
>> +
>>           pci_enable_wake(dev, target_state, wakeup);
>>
>>           error = pci_set_power_state(dev, target_state);
>> @@ -2797,6 +2801,10 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
>>           if (target_state == PCI_POWER_ERROR)
>>                   return -EIO;
>>
>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
>> +            target_state == PCI_D3cold)
>> +               return -EBUSY;
>> +
>>           __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
>>
>>           error = pci_set_power_state(dev, target_state);
>>>
>>> Alex
>>>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 764d41434a82f..7db796ebb967e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -419,6 +419,9 @@ bool amdgpu_device_supports_boco(struct drm_device *dev)
>>>>    {
>>>>           struct amdgpu_device *adev = drm_to_adev(dev);
>>>>
>>>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
>>>> +               return false;
>>>> +
>>>>           if (adev->has_pr3 ||
>>>>               ((adev->flags & AMD_IS_PX) && amdgpu_is_atpx_hybrid()))
>>>>                   return true;
>>>> --
>>>> 2.43.0
>>>>
>>



More information about the amd-gfx mailing list