[PATCH 05/18] drm/amd/display: Use mdelay to avoid crashes

Christian König christian.koenig at amd.com
Thu Dec 15 08:09:22 UTC 2022


Am 15.12.22 um 00:33 schrieb Alex Hung:
>
>
> On 2022-12-14 16:06, Alex Deucher wrote:
>> On Wed, Dec 14, 2022 at 5:56 PM Alex Hung <alex.hung at amd.com> wrote:
>>>
>>>
>>>
>>> On 2022-12-14 15:35, Alex Deucher wrote:
>>>> On Wed, Dec 14, 2022 at 5:25 PM Alex Hung <alex.hung at amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2022-12-14 14:54, Alex Deucher wrote:
>>>>>> On Wed, Dec 14, 2022 at 4:50 PM Alex Hung <alex.hung at amd.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2022-12-14 13:48, Alex Deucher wrote:
>>>>>>>> On Wed, Dec 14, 2022 at 3:22 PM Aurabindo Pillai
>>>>>>>> <aurabindo.pillai at amd.com> wrote:
>>>>>>>>>
>>>>>>>>> From: Alex Hung <alex.hung at amd.com>
>>>>>>>>>
>>>>>>>>> [Why]
>>>>>>>>> When running IGT kms_bw test with DP monitor, some systems 
>>>>>>>>> crash from
>>>>>>>>> msleep no matter how long or short the time is.
>>>>>>>>>
>>>>>>>>> [How]
>>>>>>>>> To replace msleep with mdelay.
>>>>>>>>
>>>>>>>> Can you provide a bit more info about the crash?  A lot of 
>>>>>>>> platforms
>>>>>>>> don't support delay larger than 2-4ms so this change will generate
>>>>>>>> errors on ARM and possibly other platforms.
>>>>>>>>
>>>>>>>> Alex
>>>>>>>
>>>>>>> The msleep was introduced in eec3303de3378 for non-compliant 
>>>>>>> display
>>>>>>> port monitors but IGT's kms_bw test can cause a recent h/w to 
>>>>>>> hang at
>>>>>>> msleep(60) when calling "igt_remove_fb" in IGT
>>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Figt-gpu-tools%2F-%2Fblob%2Fmaster%2Ftests%2Fkms_bw.c%23L197&data=05%7C01%7Calex.hung%40amd.com%7C81664450189542a7bbc408dade27d0e9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638066559844526853%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=M%2BF4H2qddXItPoUZRVyhlc5N8UF1%2FHIh8PpfT%2BCmdZ4%3D&reserved=0) 
>>>>>>>
>>>>>>>
>>>>>>> It is possible to workaround this by reversing order of
>>>>>>> igt_remove_fb(&buffer[i]), as the following example:
>>>>>>>
>>>>>>>       igt_create_color_fb with the order buffer[0], buffer[1], 
>>>>>>> buffer[2]
>>>>>>>
>>>>>>> Hangs:
>>>>>>>       igt_remove_fb with the order buffer[0], buffer[1], buffer[2]
>>>>>>>
>>>>>>> No hangs:
>>>>>>>       igt_remove_fb with the reversed order buffer[2], 
>>>>>>> buffer[1], buffer[0]
>>>>>>>
>>>>>>> However, IGT simply exposes the problem and it makes more sense 
>>>>>>> to stop
>>>>>>> the hang from occurring.
>>>>>>>
>>>>>>> I also tried to remove the msleep completely and it also work, 
>>>>>>> but I
>>>>>>> didn't want to break the fix for the original problematic hardware
>>>>>>> configuration.
>>>>>>
>>>>>> Why does sleep vs delay make a difference?  Is there some race 
>>>>>> that we
>>>>>> are not locking against?
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>
>>>>> That was my original thought but I did not find any previously. I 
>>>>> will
>>>>> investigate it again.
>>>>>
>>>>> If mdelay(>4) isn't usable on other platforms, is it an option to use
>>>>> mdelay on x86_64 only and keep msleep on other platforms or just 
>>>>> remove
>>>>> the msleep for other platforms, something like
>>>>>
>>>>> -                       msleep(60);
>>>>> +#ifdef CONFIG_X86_64
>>>>> +                       mdelay(60);
>>>>> +#endif
>>>>
>>>> That's pretty ugly.  I'd rather try and resolve the root cause.  How
>>>> important is the IGT test?  What does it do?  Is the test itself
>>>> correct?
>>>>
>>>
>>> Agreed, and I didn't want to add conditions around the mdelay for the
>>> same reason. I will assume this is not an option now.
>>>
>>> As in the previous comment, IGT can be modified to avoid the crash by
>>> reversing the order fb is removed - though I suspect I will receive
>>> questions why this is not fixed in kernel.
>>>
>>> I wanted to fix this in kernel because nothing stops other user-space
>>> applications to use the same way to crash kernel, so fixing IGT is the
>>> second option.
>>>
>>> Apparently causing problems on other platforms isn't an option at 
>>> all so
>>> I will try to figure out an non-mdelay solution, and then maybe an IGT
>>> solution instead.
>>>
>>
>> What hangs?  The test or the kernel or the hardware?
>
> The system becomes completely unresponsive - no keyboard, mouse nor 
> remote accesses.

I agree with Alex that changing this is extremely questionable and not 
justified at all.

My educated guess is that by using mdelay() instead of msleep() we keep 
the CPU core busy and preventing something from happening at the same 
time as something else.

This clearly points to missing locking or similar to protect concurrent 
execution of things.

Christian.

>
>>
>> Alex
>>
>>
>>>> Alex
>>>>
>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Acked-by: Aurabindo Pillai <aurabindo.pillai at amd.com>
>>>>>>>>> Signed-off-by: Alex Hung <alex.hung at amd.com>
>>>>>>>>> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira at amd.com>
>>>>>>>>> ---
>>>>>>>>> drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 
>>>>>>>>> 2 +-
>>>>>>>>>      1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git 
>>>>>>>>> a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
>>>>>>>>> b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
>>>>>>>>> index 913a1fe6b3da..e6251ccadb70 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
>>>>>>>>> @@ -1215,7 +1215,7 @@ void dce110_blank_stream(struct pipe_ctx 
>>>>>>>>> *pipe_ctx)
>>>>>>>>>                              * After output is idle pattern 
>>>>>>>>> some sinks need time to recognize the stream
>>>>>>>>>                              * has changed or they enter 
>>>>>>>>> protection state and hang.
>>>>>>>>>                              */
>>>>>>>>> -                       msleep(60);
>>>>>>>>> +                       mdelay(60);
>>>>>>>>>                     } else if (pipe_ctx->stream->signal == 
>>>>>>>>> SIGNAL_TYPE_EDP) {
>>>>>>>>>                             if 
>>>>>>>>> (!link->dc->config.edp_no_power_sequencing) {
>>>>>>>>>                                     /*
>>>>>>>>> -- 
>>>>>>>>> 2.39.0
>>>>>>>>>



More information about the amd-gfx mailing list