[PATCH 0/2] drm/amd: fix VRR race condition during IRQ handling
Tobias Jakobi
tjakobi at math.uni-bielefeld.de
Mon Sep 9 19:36:44 UTC 2024
On 9/9/24 19:18, Harry Wentland wrote:
>
> On 2024-09-09 13:11, Alex Deucher wrote:
>> On Sun, Sep 8, 2024 at 7:23 AM Tobias Jakobi
>> <tjakobi at math.uni-bielefeld.de> wrote:
>>> On 9/8/24 09:35, Christopher Snowhill wrote:
>>>
>>>> On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote:
>>>>> From: Tobias Jakobi <tjakobi at math.uni-bielefeld.de>
>>>>>
>>>>> Hello,
>>>>>
>>>>> this fixes a nasty race condition in the set_drr() callbacks for DCN10
>>>>> and DCN35 that has existed now since quite some time, see this GitLab
>>>>> issue for reference.
>>>>>
>>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3142
>>>>>
>>>>> The report just focuses von DCN10, but the same problem also exists in
>>>>> the DCN35 code.
>>>> Does the problem not exist in the following references to funcs->set_drr?
>>>>
>>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr(
>>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>>> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>>> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>>> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>>> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>> Maybe. But the big difference I see here, is that in this code there
>>> isn't even any kind of NULL check applied to tg. Or most of the members
>>> of *pipe_ctx. If there really is the same kind of problem here, then one
>>> would need to rewrite a bit more code to fix stuff.
>>>
>>> E.g. in the case of dcn31_hwseq.c, the questionable code is in
>>> dcn31_reset_back_end_for_pipe(), which is static and only called from
>>> dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap
>>> callback. And this specific callback, from what I can see, is only
>>> called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the
>>> .apply_ctx_to_hw callback. The callback is only called from
>>> dc_commit_state_no_check(). That one is static again, and called from
>>> dc_commit_streams().
>>>
>>> I could trace this even further. My point is: I don't think this is
>>> called from any IRQ handler code. And given the depth and complexity of
>>> the callgraph, I have to admit, that, at least at this point, this is a
>>> bit over my head.
>>>
>>> Sure, I could now sprinkle a bunch of x != NULL in the code, but that
>>> would be merely voodoo. And I usually try to have a theoretical basis
>>> when I apply changes to code.
>>>
>>> Maybe if someone from the AMD display team could give some insight if
>>> there still is potentially vulnerable code in some of the instances that
>>> Christopher has posted, then I would gladly take a look.
>> @Wentland, Harry can you confirm this?
>>
> As Tobias said, without extensive analysis and trace of the code in all
> possible use-case it's hard to say there's no possible way the other
> set_drr calls could potentially have a similar issue.
>
> I think Tobias' analysis is sound and this fixes a number of issues, hence
> my RB.
In fact one user pointed out another potentially vulnerable callback:
https://gitlab.freedesktop.org/drm/amd/-/issues/3142#note_2560109
Which is set_drr() in dce110_hwseq.c -- from which we know that it's
called from IRQ handler code. Also the backtrace that he posted confirms
this. That code seems to be a bit older than the DCN10/DCN25 code, as it
lacks any kind of NULL-check. I have posted a patch that more or less
copies over the DCN10/35 code. Still waiting for conclusive feedback if
the patch does something.
If it does, I'm going to post it to amd-gfx as well.
With best wishes,
Tobias
>
> Harry
>
>> Alex
>>
>>> With best wishes,
>>> Tobias
>>>
>>>>> With best wishes,
>>>>> Tobias
>>>>>
>>>>> Tobias Jakobi (2):
>>>>> drm/amd/display: Avoid race between dcn10_set_drr() and
>>>>> dc_state_destruct()
>>>>> drm/amd/display: Avoid race between dcn35_set_drr() and
>>>>> dc_state_destruct()
>>>>>
>>>>> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++--------
>>>>> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++--------
>>>>> 2 files changed, 24 insertions(+), 16 deletions(-)
More information about the amd-gfx
mailing list