[PATCH v5] drm/amd/display: Revert W/A for hard hangs on DCN20/DCN21

Fri Jan 14 16:48:37 UTC 2022

[Public]

> -----Original Message-----
> From: Limonciello, Mario <Mario.Limonciello at amd.com>
> Sent: January 14, 2022 10:38 AM
> To: Chris Hixon <linux-kernel-bugs at hixontech.com>; Kazlauskas, Nicholas
> <Nicholas.Kazlauskas at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Zhuo, Qingqing (Lillian) <Qingqing.Zhuo at amd.com>; Scott Bruce
> <smbruce at gmail.com>; spasswolf at web.de
> Subject: RE: [PATCH v5] drm/amd/display: Revert W/A for hard hangs on
> DCN20/DCN21
> Importance: High
>
> [AMD Official Use Only]
>
> > >
> > >
> > >> I think the revert is fine once we figure out where we're missing calls to:
> > >>
> > >>          .optimize_pwr_state = dcn21_optimize_pwr_state,
> > >>          .exit_optimized_pwr_state = dcn21_exit_optimized_pwr_state,
> > >>
> > >> These are already part of dc_link_detect, so I suspect there's another
> > interface
> > >> in DC that should be using these.
> > >>
> > >> I think the best way to debug this is to revert the patch locally and add a
> stack
> > >> dump when DMCUB hangs our times out.
> > > OK so I did this on top of amd-staging-drm-next with my v5 patch (this
> revert in
> > place)
> > >
> > > diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > > index 9280f2abd973..0bd32f82f3db 100644
> > > --- a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > > +++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
> > > @@ -789,8 +789,10 @@ enum dmub_status
> > dmub_srv_cmd_with_reply_data(struct dmub_srv *dmub,
> > >          // Execute command
> > >          status = dmub_srv_cmd_execute(dmub);
> > >
> > > -       if (status != DMUB_STATUS_OK)
> > > +       if (status != DMUB_STATUS_OK) {
> > > +               ASSERT(0);
> > >                  return status;
> > > +       }
> > >
> > >          // Wait for DMUB to process command
> > >          status = dmub_srv_wait_for_idle(dmub, 100000);
> > >
> > >> That way you can know where the PHY was trying to be accessed
> without the
> > >> refclk being on.
> > >>
> > >> We had a similar issue in DCN31 which didn't require a W/A like DCN21.
> > >>
> > >> I'd like to hold off on merging this until that hang is verified as gone.
> > >>
> > > Then I took a RN laptop running DMUB 0x01010019 and disabled eDP, and
> > confirmed
> > > no CRTC was configured but plugged in an HDMI cable:
> > >
> > > connector[78]: eDP-1
> > >          crtc=(null)
> > >          self_refresh_aware=0
> > > connector[85]: HDMI-A-1
> > >          crtc=crtc-1
> > >          self_refresh_aware=0
> > >
> > > I triggered 100 hotplugs like this:
> > >
> > > #!/bin/bash
> > > for i in {0..100..1}
> > > do
> > >      echo 1 | tee /sys/kernel/debug/dri/0/HDMI-A-1/trigger_hotplug
> > >      sleep 3
> > > done
> > >
> > > Unfortunately, no hang or traceback to be seen (and HDMI continues to
> work).
> > > I also manually pulled the plug a handful of times I don't know the
> specifics
> > that Lillian had the
> > > failure though, so this might not be a good enough check.
> > >
> > > I'll try to upgrade DMUB to 0x101001c (the latest version) and double
> check
> > that as well.
> >
> > I applied patch v5 and the above ASSERT patch, on top of both Linux
> > 5.16-rc8 and 5.16.
> >
> > Result: no problems with suspend/resume, 16+ cycles.
> >
> > As far as the hang goes:
> >
> > I plugged in an HDMI cable connected to my TV, and configured Gnome to
> > use the external display only.
> >
> > connectors from /sys/kernel/debug/dri/0/state:
> >
> > connector[78]: eDP-1
> >      crtc=(null)
> >      self_refresh_aware=0
> > connector[85]: HDMI-A-1
> >      crtc=crtc-1
> >      self_refresh_aware=0
> > connector[89]: DP-1
> >      crtc=(null)
> >      self_refresh_aware=0
> >
> > I manually unplugged/plugged the HDMI cable 16+ times, and also ran:
> >
> > $ sudo sh -c 'for ((i=0;i<100;i++)); do echo 1 | tee
> > /sys/kernel/debug/dri/0/HDMI-A-1/trigger_hotplug; sleep 3; done'
> >
> > The system did not hang, and I saw no kernel log output from the ASSERT.
> >
> > I also tried a USB-C dock with an HDMI port, with the same results,
> > though there are other issues with this (perhaps worthy of other bug
> > reports).
> >
> > Is there some reason to use amd-staging-drm-next for this test?
> >
> > I don't use the HDMI connection much and I have never experienced a
> hang
> > with HDMI in the first place. Can someone send a link to an
> > issue/discussion where this hang is being discussed?
> >
> > HW: HP ENVY x360 Convertible 15-ds1xxx, AMD Ryzen 7 4700U with
> Radeon
> > Graphics
> > OS/Desktop: Arch Linux, Gnome 41.3 (Wayland)
> > FW: linux-firmware-git 20211229.57d6b95-1, DMUB version=0x0101001C
> >
>
> Nicholas,
>
> We've got a handful of people now (myself included) who have done a
> bunch of
> physical and software triggered hotplugs on a variety of ports on top of both
> amd-staging-drm-next and 5.16 and not seeing any hangs.  Given this is
> lingering
> on 5.16, are you amenable to it and letting Lillian dig further after she returns
> on
> the specific case that she had problems with to see if we're missing anything
> else?
>
> Thanks,

I think it was observed during HDMI compliance testing or frequent HDCP enter/exit on Chrome, I don't remember the details off the top of my head. The system would completely lock up under those conditions.

I'm not familiar with the urgency of the request for your specific issue, but if you feel that the tradeoff is worth it then you can go ahead and revert for now.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>

Regards,
Nicholas Kazlauskas