[PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free

Kazlauskas, Nicholas nicholas.kazlauskas at amd.com
Tue Jul 28 17:07:13 UTC 2020


On 2020-07-28 5:22 a.m., Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> Am 25.07.20 um 07:20 schrieb Mazin Rezk:
>> On Saturday, July 25, 2020 12:59 AM, Duncan wrote:
>>
>>> On Sat, 25 Jul 2020 03:03:52 +0000 Mazin Rezk wrote:
>>>
>>>>> Am 24.07.20 um 19:33 schrieb Kees Cook:
>>>>>
>>>>>> There was a fix to disable the async path for this driver that
>>>>>> worked around the bug too, yes? That seems like a safer and more
>>>>>> focused change that doesn't revert the SLUB defense for all
>>>>>> users, and would actually provide a complete, I think, workaround
>>>>
>>>> That said, I haven't seen the async disabling patch. If you could
>>>> link to it, I'd be glad to test it out and perhaps we can use that
>>>> instead.
>>>
>>> I'm confused. Not to put words in Kees' mouth; /I/ am confused (which
>>> admittedly could well be just because I make no claims to be a
>>> coder and am simply reading the bug and thread, but I'd appreciate some
>>> "unconfusing" anyway).
>>>
>>> My interpretation of the "async disabling" reference was that it was to
>>> comment #30 on the bug:
>>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=207383#c30 
>>>
>>>
>>> ... which (if I'm not confused on this point too) appears to be yours.
>>> There it was stated...
>>>
>>> I've also found that this bug exclusively occurs when commit_work is on
>>> the workqueue. After forcing drm_atomic_helper_commit to run all of the
>>> commits without adding to the workqueue and running the OS, the issue
>>> seems to have disappeared.
>>> <<<<
>>>
>>> Would not forcing all commits to run directly, without placing them on
>>> the workqueue, be "async disabling"? That's what I /thought/ he was
>>> referencing.
>>
>> Oh, I thought he was referring to a different patch. Kees, could I get
>> your confirmation on this?
>>
>> The change I made actually affected all of the DRM code, although this 
>> could
>> easily be changed to be specific to amdgpu. (By forcing blocking on
>> amdgpu_dm's non-blocking commit code)
>>
>> That said, I'd still need to test further because I only did test it 
>> for a
>> couple of hours then. Although it should work in theory.
>>
>>> OTOH your base/context swap idea sounds like a possibly "less
>>> disturbance" workaround, if it works, and given the point in the
>>> commit cycle... (But if it's out Sunday it's likely too late to test
>>> and get it in now anyway; if it's another week, tho...)
>>
>> The base/context swap idea should make the use-after-free behave how it
>> did in 5.6. Since the bug doesn't cause an issue in 5.6, it's less of a
>> "less disturbance" workaround and more of a "no disturbance" workaround.
> 
> Sorry for bothering, but is there now a solution, besides reverting the 
> commits, to avoid freezes/crashes *without* performance regressions?
> 
> 
> Kind regards,
> 
> Paul

Mazin's "drm/amd/display: Clear dm_state for fast updates" change 
accomplishes this, at least as a temporary hack.

I've started work on a more large scale fix that we could get in in after.

Regards,
Nicholas Kazlauskas


More information about the amd-gfx mailing list