[PATCH v5] drm/amd/display: Revert W/A for hard hangs on DCN20/DCN21

Limonciello, Mario Mario.Limonciello at amd.com
Fri Jan 14 18:47:08 UTC 2022


[Public]

> > > >
> > > >
> > > >> I think the revert is fine once we figure out where we're missing calls to:
> > > >>
> > > >>          .optimize_pwr_state = dcn21_optimize_pwr_state,
> > > >>          .exit_optimized_pwr_state = dcn21_exit_optimized_pwr_state,
> > > >>
> > > >> These are already part of dc_link_detect, so I suspect there's another
> > > interface
> > > >> in DC that should be using these.
> > > >>
> > > >> I think the best way to debug this is to revert the patch locally and add a
> > stack
> > > >> dump when DMCUB hangs our times out.
> > > > OK so I did this on top of amd-staging-drm-next with my v5 patch (this
> > revert in
> > > place)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > > b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > > > index 9280f2abd973..0bd32f82f3db 100644
> > > > --- a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > > > +++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > > > @@ -789,8 +789,10 @@ enum dmub_status
> > > dmub_srv_cmd_with_reply_data(struct dmub_srv *dmub,
> > > >          // Execute command
> > > >          status = dmub_srv_cmd_execute(dmub);
> > > >
> > > > -       if (status != DMUB_STATUS_OK)
> > > > +       if (status != DMUB_STATUS_OK) {
> > > > +               ASSERT(0);
> > > >                  return status;
> > > > +       }
> > > >
> > > >          // Wait for DMUB to process command
> > > >          status = dmub_srv_wait_for_idle(dmub, 100000);
> > > >
> > > >> That way you can know where the PHY was trying to be accessed
> > without the
> > > >> refclk being on.
> > > >>
> > > >> We had a similar issue in DCN31 which didn't require a W/A like DCN21.
> > > >>
> > > >> I'd like to hold off on merging this until that hang is verified as gone.
> > > >>
> > > > Then I took a RN laptop running DMUB 0x01010019 and disabled eDP, and
> > > confirmed
> > > > no CRTC was configured but plugged in an HDMI cable:
> > > >
> > > > connector[78]: eDP-1
> > > >          crtc=(null)
> > > >          self_refresh_aware=0
> > > > connector[85]: HDMI-A-1
> > > >          crtc=crtc-1
> > > >          self_refresh_aware=0
> > > >
> > > > I triggered 100 hotplugs like this:
> > > >
> > > > #!/bin/bash
> > > > for i in {0..100..1}
> > > > do
> > > >      echo 1 | tee /sys/kernel/debug/dri/0/HDMI-A-1/trigger_hotplug
> > > >      sleep 3
> > > > done
> > > >
> > > > Unfortunately, no hang or traceback to be seen (and HDMI continues to
> > work).
> > > > I also manually pulled the plug a handful of times I don't know the
> > specifics
> > > that Lillian had the
> > > > failure though, so this might not be a good enough check.
> > > >
> > > > I'll try to upgrade DMUB to 0x101001c (the latest version) and double
> > check
> > > that as well.
> > >
> > > I applied patch v5 and the above ASSERT patch, on top of both Linux
> > > 5.16-rc8 and 5.16.
> > >
> > > Result: no problems with suspend/resume, 16+ cycles.
> > >
> > > As far as the hang goes:
> > >
> > > I plugged in an HDMI cable connected to my TV, and configured Gnome to
> > > use the external display only.
> > >
> > > connectors from /sys/kernel/debug/dri/0/state:
> > >
> > > connector[78]: eDP-1
> > >      crtc=(null)
> > >      self_refresh_aware=0
> > > connector[85]: HDMI-A-1
> > >      crtc=crtc-1
> > >      self_refresh_aware=0
> > > connector[89]: DP-1
> > >      crtc=(null)
> > >      self_refresh_aware=0
> > >
> > > I manually unplugged/plugged the HDMI cable 16+ times, and also ran:
> > >
> > > $ sudo sh -c 'for ((i=0;i<100;i++)); do echo 1 | tee
> > > /sys/kernel/debug/dri/0/HDMI-A-1/trigger_hotplug; sleep 3; done'
> > >
> > > The system did not hang, and I saw no kernel log output from the ASSERT.
> > >
> > > I also tried a USB-C dock with an HDMI port, with the same results,
> > > though there are other issues with this (perhaps worthy of other bug
> > > reports).
> > >
> > > Is there some reason to use amd-staging-drm-next for this test?
> > >
> > > I don't use the HDMI connection much and I have never experienced a
> > hang
> > > with HDMI in the first place. Can someone send a link to an
> > > issue/discussion where this hang is being discussed?
> > >
> > > HW: HP ENVY x360 Convertible 15-ds1xxx, AMD Ryzen 7 4700U with
> > Radeon
> > > Graphics
> > > OS/Desktop: Arch Linux, Gnome 41.3 (Wayland)
> > > FW: linux-firmware-git 20211229.57d6b95-1, DMUB version=0x0101001C
> > >
> >
> > Nicholas,
> >
> > We've got a handful of people now (myself included) who have done a
> > bunch of
> > physical and software triggered hotplugs on a variety of ports on top of both
> > amd-staging-drm-next and 5.16 and not seeing any hangs.  Given this is
> > lingering
> > on 5.16, are you amenable to it and letting Lillian dig further after she returns
> > on
> > the specific case that she had problems with to see if we're missing anything
> > else?
> >
> > Thanks,
> 
> I think it was observed during HDMI compliance testing or frequent HDCP
> enter/exit on Chrome, I don't remember the details off the top of my head. The
> system would completely lock up under those conditions.
> 
> I'm not familiar with the urgency of the request for your specific issue, but if you
> feel that the tradeoff is worth it then you can go ahead and revert for now.
> 
> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
> 
> Regards,
> Nicholas Kazlauskas

Thanks.  Alex, when this pulls in can you add CC for stable so we get it in 5.16.1 too?


More information about the amd-gfx mailing list