[PATCH v5] drm/amd/display: Revert W/A for hard hangs on DCN20/DCN21

Limonciello, Mario Mario.Limonciello at amd.com
Fri Jan 14 15:38:21 UTC 2022


[AMD Official Use Only]

> >
> >
> >> I think the revert is fine once we figure out where we're missing calls to:
> >>
> >>          .optimize_pwr_state = dcn21_optimize_pwr_state,
> >>          .exit_optimized_pwr_state = dcn21_exit_optimized_pwr_state,
> >>
> >> These are already part of dc_link_detect, so I suspect there's another
> interface
> >> in DC that should be using these.
> >>
> >> I think the best way to debug this is to revert the patch locally and add a stack
> >> dump when DMCUB hangs our times out.
> > OK so I did this on top of amd-staging-drm-next with my v5 patch (this revert in
> place)
> >
> > diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > index 9280f2abd973..0bd32f82f3db 100644
> > --- a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > +++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > @@ -789,8 +789,10 @@ enum dmub_status
> dmub_srv_cmd_with_reply_data(struct dmub_srv *dmub,
> >          // Execute command
> >          status = dmub_srv_cmd_execute(dmub);
> >
> > -       if (status != DMUB_STATUS_OK)
> > +       if (status != DMUB_STATUS_OK) {
> > +               ASSERT(0);
> >                  return status;
> > +       }
> >
> >          // Wait for DMUB to process command
> >          status = dmub_srv_wait_for_idle(dmub, 100000);
> >
> >> That way you can know where the PHY was trying to be accessed without the
> >> refclk being on.
> >>
> >> We had a similar issue in DCN31 which didn't require a W/A like DCN21.
> >>
> >> I'd like to hold off on merging this until that hang is verified as gone.
> >>
> > Then I took a RN laptop running DMUB 0x01010019 and disabled eDP, and
> confirmed
> > no CRTC was configured but plugged in an HDMI cable:
> >
> > connector[78]: eDP-1
> >          crtc=(null)
> >          self_refresh_aware=0
> > connector[85]: HDMI-A-1
> >          crtc=crtc-1
> >          self_refresh_aware=0
> >
> > I triggered 100 hotplugs like this:
> >
> > #!/bin/bash
> > for i in {0..100..1}
> > do
> >      echo 1 | tee /sys/kernel/debug/dri/0/HDMI-A-1/trigger_hotplug
> >      sleep 3
> > done
> >
> > Unfortunately, no hang or traceback to be seen (and HDMI continues to work).
> > I also manually pulled the plug a handful of times I don't know the specifics
> that Lillian had the
> > failure though, so this might not be a good enough check.
> >
> > I'll try to upgrade DMUB to 0x101001c (the latest version) and double check
> that as well.
> 
> I applied patch v5 and the above ASSERT patch, on top of both Linux
> 5.16-rc8 and 5.16.
> 
> Result: no problems with suspend/resume, 16+ cycles.
> 
> As far as the hang goes:
> 
> I plugged in an HDMI cable connected to my TV, and configured Gnome to
> use the external display only.
> 
> connectors from /sys/kernel/debug/dri/0/state:
> 
> connector[78]: eDP-1
>      crtc=(null)
>      self_refresh_aware=0
> connector[85]: HDMI-A-1
>      crtc=crtc-1
>      self_refresh_aware=0
> connector[89]: DP-1
>      crtc=(null)
>      self_refresh_aware=0
> 
> I manually unplugged/plugged the HDMI cable 16+ times, and also ran:
> 
> $ sudo sh -c 'for ((i=0;i<100;i++)); do echo 1 | tee
> /sys/kernel/debug/dri/0/HDMI-A-1/trigger_hotplug; sleep 3; done'
> 
> The system did not hang, and I saw no kernel log output from the ASSERT.
> 
> I also tried a USB-C dock with an HDMI port, with the same results,
> though there are other issues with this (perhaps worthy of other bug
> reports).
> 
> Is there some reason to use amd-staging-drm-next for this test?
> 
> I don't use the HDMI connection much and I have never experienced a hang
> with HDMI in the first place. Can someone send a link to an
> issue/discussion where this hang is being discussed?
> 
> HW: HP ENVY x360 Convertible 15-ds1xxx, AMD Ryzen 7 4700U with Radeon
> Graphics
> OS/Desktop: Arch Linux, Gnome 41.3 (Wayland)
> FW: linux-firmware-git 20211229.57d6b95-1, DMUB version=0x0101001C
> 

Nicholas,

We've got a handful of people now (myself included) who have done a bunch of
physical and software triggered hotplugs on a variety of ports on top of both
amd-staging-drm-next and 5.16 and not seeing any hangs.  Given this is lingering
on 5.16, are you amenable to it and letting Lillian dig further after she returns on
the specific case that she had problems with to see if we're missing anything else?

Thanks,


More information about the amd-gfx mailing list