[PATCH v5] drm/amd/display: Revert W/A for hard hangs on DCN20/DCN21
Chris Hixon
linux-kernel-bugs at hixontech.com
Fri Jan 14 12:24:30 UTC 2022
On 1/7/22 11:51, Limonciello, Mario wrote:
> [AMD Official Use Only]
>
>
>> I think the revert is fine once we figure out where we're missing calls to:
>>
>> .optimize_pwr_state = dcn21_optimize_pwr_state,
>> .exit_optimized_pwr_state = dcn21_exit_optimized_pwr_state,
>>
>> These are already part of dc_link_detect, so I suspect there's another interface
>> in DC that should be using these.
>>
>> I think the best way to debug this is to revert the patch locally and add a stack
>> dump when DMCUB hangs our times out.
> OK so I did this on top of amd-staging-drm-next with my v5 patch (this revert in place)
>
> diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> index 9280f2abd973..0bd32f82f3db 100644
> --- a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> +++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> @@ -789,8 +789,10 @@ enum dmub_status dmub_srv_cmd_with_reply_data(struct dmub_srv *dmub,
> // Execute command
> status = dmub_srv_cmd_execute(dmub);
>
> - if (status != DMUB_STATUS_OK)
> + if (status != DMUB_STATUS_OK) {
> + ASSERT(0);
> return status;
> + }
>
> // Wait for DMUB to process command
> status = dmub_srv_wait_for_idle(dmub, 100000);
>
>> That way you can know where the PHY was trying to be accessed without the
>> refclk being on.
>>
>> We had a similar issue in DCN31 which didn't require a W/A like DCN21.
>>
>> I'd like to hold off on merging this until that hang is verified as gone.
>>
> Then I took a RN laptop running DMUB 0x01010019 and disabled eDP, and confirmed
> no CRTC was configured but plugged in an HDMI cable:
>
> connector[78]: eDP-1
> crtc=(null)
> self_refresh_aware=0
> connector[85]: HDMI-A-1
> crtc=crtc-1
> self_refresh_aware=0
>
> I triggered 100 hotplugs like this:
>
> #!/bin/bash
> for i in {0..100..1}
> do
> echo 1 | tee /sys/kernel/debug/dri/0/HDMI-A-1/trigger_hotplug
> sleep 3
> done
>
> Unfortunately, no hang or traceback to be seen (and HDMI continues to work).
> I also manually pulled the plug a handful of times I don't know the specifics that Lillian had the
> failure though, so this might not be a good enough check.
>
> I'll try to upgrade DMUB to 0x101001c (the latest version) and double check that as well.
I applied patch v5 and the above ASSERT patch, on top of both Linux
5.16-rc8 and 5.16.
Result: no problems with suspend/resume, 16+ cycles.
As far as the hang goes:
I plugged in an HDMI cable connected to my TV, and configured Gnome to
use the external display only.
connectors from /sys/kernel/debug/dri/0/state:
connector[78]: eDP-1
crtc=(null)
self_refresh_aware=0
connector[85]: HDMI-A-1
crtc=crtc-1
self_refresh_aware=0
connector[89]: DP-1
crtc=(null)
self_refresh_aware=0
I manually unplugged/plugged the HDMI cable 16+ times, and also ran:
$ sudo sh -c 'for ((i=0;i<100;i++)); do echo 1 | tee
/sys/kernel/debug/dri/0/HDMI-A-1/trigger_hotplug; sleep 3; done'
The system did not hang, and I saw no kernel log output from the ASSERT.
I also tried a USB-C dock with an HDMI port, with the same results,
though there are other issues with this (perhaps worthy of other bug
reports).
Is there some reason to use amd-staging-drm-next for this test?
I don't use the HDMI connection much and I have never experienced a hang
with HDMI in the first place. Can someone send a link to an
issue/discussion where this hang is being discussed?
HW: HP ENVY x360 Convertible 15-ds1xxx, AMD Ryzen 7 4700U with Radeon
Graphics
OS/Desktop: Arch Linux, Gnome 41.3 (Wayland)
FW: linux-firmware-git 20211229.57d6b95-1, DMUB version=0x0101001C
More information about the amd-gfx
mailing list