[PATCH 05/18] drm/amd/display: Use mdelay to avoid crashes

Alex Deucher alexdeucher at gmail.com
Wed Dec 14 23:06:09 UTC 2022


On Wed, Dec 14, 2022 at 5:56 PM Alex Hung <alex.hung at amd.com> wrote:
>
>
>
> On 2022-12-14 15:35, Alex Deucher wrote:
> > On Wed, Dec 14, 2022 at 5:25 PM Alex Hung <alex.hung at amd.com> wrote:
> >>
> >>
> >>
> >> On 2022-12-14 14:54, Alex Deucher wrote:
> >>> On Wed, Dec 14, 2022 at 4:50 PM Alex Hung <alex.hung at amd.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2022-12-14 13:48, Alex Deucher wrote:
> >>>>> On Wed, Dec 14, 2022 at 3:22 PM Aurabindo Pillai
> >>>>> <aurabindo.pillai at amd.com> wrote:
> >>>>>>
> >>>>>> From: Alex Hung <alex.hung at amd.com>
> >>>>>>
> >>>>>> [Why]
> >>>>>> When running IGT kms_bw test with DP monitor, some systems crash from
> >>>>>> msleep no matter how long or short the time is.
> >>>>>>
> >>>>>> [How]
> >>>>>> To replace msleep with mdelay.
> >>>>>
> >>>>> Can you provide a bit more info about the crash?  A lot of platforms
> >>>>> don't support delay larger than 2-4ms so this change will generate
> >>>>> errors on ARM and possibly other platforms.
> >>>>>
> >>>>> Alex
> >>>>
> >>>> The msleep was introduced in eec3303de3378 for non-compliant display
> >>>> port monitors but IGT's kms_bw test can cause a recent h/w to hang at
> >>>> msleep(60) when calling "igt_remove_fb" in IGT
> >>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Figt-gpu-tools%2F-%2Fblob%2Fmaster%2Ftests%2Fkms_bw.c%23L197&data=05%7C01%7Calex.hung%40amd.com%7Cef40490c3d0f4a0448a808dade239493%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638066541657914573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=kx7mnbuN%2BhmIPEgj4l1cuek0nvqK16Ig3fWAHopA8CI%3D&reserved=0)
> >>>>
> >>>> It is possible to workaround this by reversing order of
> >>>> igt_remove_fb(&buffer[i]), as the following example:
> >>>>
> >>>>      igt_create_color_fb with the order buffer[0], buffer[1], buffer[2]
> >>>>
> >>>> Hangs:
> >>>>      igt_remove_fb with the order buffer[0], buffer[1], buffer[2]
> >>>>
> >>>> No hangs:
> >>>>      igt_remove_fb with the reversed order buffer[2], buffer[1], buffer[0]
> >>>>
> >>>> However, IGT simply exposes the problem and it makes more sense to stop
> >>>> the hang from occurring.
> >>>>
> >>>> I also tried to remove the msleep completely and it also work, but I
> >>>> didn't want to break the fix for the original problematic hardware
> >>>> configuration.
> >>>
> >>> Why does sleep vs delay make a difference?  Is there some race that we
> >>> are not locking against?
> >>>
> >>> Alex
> >>>
> >>
> >> That was my original thought but I did not find any previously. I will
> >> investigate it again.
> >>
> >> If mdelay(>4) isn't usable on other platforms, is it an option to use
> >> mdelay on x86_64 only and keep msleep on other platforms or just remove
> >> the msleep for other platforms, something like
> >>
> >> -                       msleep(60);
> >> +#ifdef CONFIG_X86_64
> >> +                       mdelay(60);
> >> +#endif
> >
> > That's pretty ugly.  I'd rather try and resolve the root cause.  How
> > important is the IGT test?  What does it do?  Is the test itself
> > correct?
> >
>
> Agreed, and I didn't want to add conditions around the mdelay for the
> same reason. I will assume this is not an option now.
>
> As in the previous comment, IGT can be modified to avoid the crash by
> reversing the order fb is removed - though I suspect I will receive
> questions why this is not fixed in kernel.
>
> I wanted to fix this in kernel because nothing stops other user-space
> applications to use the same way to crash kernel, so fixing IGT is the
> second option.
>
> Apparently causing problems on other platforms isn't an option at all so
> I will try to figure out an non-mdelay solution, and then maybe an IGT
> solution instead.
>

What hangs?  The test or the kernel or the hardware?

Alex


> > Alex
> >
> >
> >>
> >>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Acked-by: Aurabindo Pillai <aurabindo.pillai at amd.com>
> >>>>>> Signed-off-by: Alex Hung <alex.hung at amd.com>
> >>>>>> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira at amd.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 2 +-
> >>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
> >>>>>> index 913a1fe6b3da..e6251ccadb70 100644
> >>>>>> --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
> >>>>>> +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
> >>>>>> @@ -1215,7 +1215,7 @@ void dce110_blank_stream(struct pipe_ctx *pipe_ctx)
> >>>>>>                             * After output is idle pattern some sinks need time to recognize the stream
> >>>>>>                             * has changed or they enter protection state and hang.
> >>>>>>                             */
> >>>>>> -                       msleep(60);
> >>>>>> +                       mdelay(60);
> >>>>>>                    } else if (pipe_ctx->stream->signal == SIGNAL_TYPE_EDP) {
> >>>>>>                            if (!link->dc->config.edp_no_power_sequencing) {
> >>>>>>                                    /*
> >>>>>> --
> >>>>>> 2.39.0
> >>>>>>


More information about the amd-gfx mailing list