[PATCH] drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO

Alex Deucher alexdeucher at gmail.com
Wed Dec 11 21:41:33 UTC 2024


On Wed, Dec 11, 2024 at 3:19 PM Mario Limonciello <superm1 at kernel.org> wrote:
>
> On 12/11/2024 14:08, Alex Deucher wrote:
> > On Wed, Dec 11, 2024 at 10:56 AM Mario Limonciello <superm1 at kernel.org> wrote:
> >>
> >> From: Mario Limonciello <mario.limonciello at amd.com>
> >>
> >> If the kernel hasn't been compiled with PCIe hotplug support this
> >> can lead to problems with dGPUs that use BOCO because they effectively
> >> drop off the bus.
> >>
> >> To prevent issues, disable BOCO support when compiled without PCIe hotplug.
> >>
> >> Reported-by: Gabriel Marcano <gabemarcano at yahoo.com>
> >> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862
> >> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
> >
> > Acked-by: Alex Deucher <alexander.deucher at amd.com>
>
> Thx.
>
> >
> > Seems like this should affect any device which supports d3cold.  Maybe
> > we want something more general as well?
>
> Any specific ideas?  One of these two hunks I think make sense, leaning
> upon the second one more strongly.

Actually, I wonder if the affected hardware pre-dates d3cold and uses
the old proprietary AMD ATPX interface to control dGPU power.  In that
case, the d3cold is managed by the driver rather than the PCI/ACPI
subsystems.  IIRC, there was a workaround in the PCIe hotplug code to
avoid calling the pci remove function when the driver powered down the
GPU via ATPX (or the nvidia equivalent).  If so, this check should go
in amdgpu_device_supports_px() instead.

Alex

>
>
>
>                                    diff --git a/drivers/pci/pci.c
> b/drivers/pci/pci.c
> index 0b29ec6e8e5e2..01691f1d26fbe 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2751,6 +2751,10 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>          if (target_state == PCI_POWER_ERROR)
>                  return -EIO;
>
> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
> +           target_state == PCI_D3cold)
> +               return -EBUSY;
> +
>          pci_enable_wake(dev, target_state, wakeup);
>
>          error = pci_set_power_state(dev, target_state);
> @@ -2797,6 +2801,10 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
>          if (target_state == PCI_POWER_ERROR)
>                  return -EIO;
>
> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
> +            target_state == PCI_D3cold)
> +               return -EBUSY;
> +
>          __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
>
>          error = pci_set_power_state(dev, target_state);
> >
> > Alex
> >
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index 764d41434a82f..7db796ebb967e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -419,6 +419,9 @@ bool amdgpu_device_supports_boco(struct drm_device *dev)
> >>   {
> >>          struct amdgpu_device *adev = drm_to_adev(dev);
> >>
> >> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> >> +               return false;
> >> +
> >>          if (adev->has_pr3 ||
> >>              ((adev->flags & AMD_IS_PX) && amdgpu_is_atpx_hybrid()))
> >>                  return true;
> >> --
> >> 2.43.0
> >>
>


More information about the amd-gfx mailing list