[PATCH] drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO

Mario Limonciello superm1 at kernel.org
Wed Dec 11 22:18:21 UTC 2024


On 12/11/2024 16:16, Gabriel Marcano wrote:
>   >On 12/11/2024 15:41, Alex Deucher wrote:
>>> On Wed, Dec 11, 2024 at 3:19 PM Mario Limonciello <superm1 at kernel.org> wrote:
>>>>
>>>> On 12/11/2024 14:08, Alex Deucher wrote:
>>>>> On Wed, Dec 11, 2024 at 10:56 AM Mario Limonciello <superm1 at kernel.org> wrote:
>>>>>>
>>>>>> From: Mario Limonciello <mario.limonciello at amd.com>
>>>>>>
>>>>>> If the kernel hasn't been compiled with PCIe hotplug support this
>>>>>> can lead to problems with dGPUs that use BOCO because they effectively
>>>>>> drop off the bus.
>>>>>>
>>>>>> To prevent issues, disable BOCO support when compiled without PCIe hotplug.
>>>>>>
>>>>>> Reported-by: Gabriel Marcano <gabemarcano at yahoo.com>
>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862
>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>>>>>
>>>>> Acked-by: Alex Deucher <alexander.deucher at amd.com>
>>>>
>>>> Thx.
>>>>
>>>>>
>>>>> Seems like this should affect any device which supports d3cold.  Maybe
>>>>> we want something more general as well?
>>>>
>>>> Any specific ideas?  One of these two hunks I think make sense, leaning
>>>> upon the second one more strongly.
>>>   
>>> Actually, I wonder if the affected hardware pre-dates d3cold and uses
>>> the old proprietary AMD ATPX interface to control dGPU power.  In that
>>> case, the d3cold is managed by the driver rather than the PCI/ACPI
>>> subsystems.  IIRC, there was a workaround in the PCIe hotplug code to
>>> avoid calling the pci remove function when the driver powered down the
>>> GPU via ATPX (or the nvidia equivalent).  If so, this check should go
>>> in amdgpu_device_supports_px() instead.
>>
>> Gabriel,
>>
>> Can you please share a full kernel log so we can clarify which method
>> your hardware uses?
>>
> 
> Sure thing. I am attaching a kernel output from last night (it actually crashed
> what looks to be the renoir APU as I tried to turn off the computer, which
> shows up in the logs towards the end).
> 
> Some caveats about my system:
>   - I'm using some modified ACPI tables:
>     - I've tweaked some WMI-related WMAX code (read/write GPIO for RGB controller)
>     - I've fixed a missing symbol issue (renamed _EC0 to __EC)
>     - Fixed a bunch of other warnings reported by iasl
>   - I have `#define DEBUG 1` in amdgpu_drv.c
>   - I have a patch from https://bugzilla.kernel.org/show_bug.cgi?id=215884
>     applied
>   - My kernel is using Gentoo patches
> 
> Looking at my dmesg output, it looks like I'm using ATPX:
> [  +0.000022] amdgpu: vga_switcheroo: detected switching method
>                        \_SB_.PCI0.GP17.VGA_.ATPX handle
> [  +0.001561] amdgpu: ATPX version 1, functions 0x00000001
> [  +0.000120] amdgpu: ATPX Hybrid Graphics
> 
> Also I see this in my ACPI table dissasembly:
>    Scope (\_SB.PCI0.GP17.VGA)
>    {
>        Name (M189, Buffer (0x0100){})
>        Name (M190, Ones)
>        Name (M191, Ones)
>        Method (ATPX, 2, Serialized)
>        {
> 
> 
> If you need me to recompile the kernel and/or disable my changes to my ACPI
> tables, let me know.

Your log also reports this though:

amdgpu 0000:03:00.0: amdgpu: Using BOCO for runtime pm


> 
> Thanks,
> 
> Gabriel
> 
> 
>> Thanks,
>>>   
>>> Alex
>>>   
>>>>
>>>>
>>>>
>>>>                                      diff --git a/drivers/pci/pci.c
>>>> b/drivers/pci/pci.c
>>>> index 0b29ec6e8e5e2..01691f1d26fbe 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -2751,6 +2751,10 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>>>>            if (target_state == PCI_POWER_ERROR)
>>>>                    return -EIO;
>>>>
>>>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
>>>> +           target_state == PCI_D3cold)
>>>> +               return -EBUSY;
>>>> +
>>>>            pci_enable_wake(dev, target_state, wakeup);
>>>>
>>>>            error = pci_set_power_state(dev, target_state);
>>>> @@ -2797,6 +2801,10 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
>>>>            if (target_state == PCI_POWER_ERROR)
>>>>                    return -EIO;
>>>>
>>>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
>>>> +            target_state == PCI_D3cold)
>>>> +               return -EBUSY;
>>>> +
>>>>            __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
>>>>
>>>>            error = pci_set_power_state(dev, target_state);
>>>>>
>>>>> Alex
>>>>>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>>>>>>     1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index 764d41434a82f..7db796ebb967e 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -419,6 +419,9 @@ bool amdgpu_device_supports_boco(struct drm_device *dev)
>>>>>>     {
>>>>>>            struct amdgpu_device *adev = drm_to_adev(dev);
>>>>>>
>>>>>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
>>>>>> +               return false;
>>>>>> +
>>>>>>            if (adev->has_pr3 ||
>>>>>>                ((adev->flags & AMD_IS_PX) && amdgpu_is_atpx_hybrid()))
>>>>>>                    return true;
>>>>>> --
>>>>>> 2.43.0
>>>>>>
>>>>
> 
> 



More information about the amd-gfx mailing list