[PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free

Paul Menzel pmenzel at molgen.mpg.de
Tue Jul 28 09:22:12 UTC 2020


Dear Linux folks,


Am 25.07.20 um 07:20 schrieb Mazin Rezk:
> On Saturday, July 25, 2020 12:59 AM, Duncan wrote:
> 
>> On Sat, 25 Jul 2020 03:03:52 +0000 Mazin Rezk wrote:
>>
>>>> Am 24.07.20 um 19:33 schrieb Kees Cook:
>>>>
>>>>> There was a fix to disable the async path for this driver that
>>>>> worked around the bug too, yes? That seems like a safer and more
>>>>> focused change that doesn't revert the SLUB defense for all
>>>>> users, and would actually provide a complete, I think, workaround
>>>
>>> That said, I haven't seen the async disabling patch. If you could
>>> link to it, I'd be glad to test it out and perhaps we can use that
>>> instead.
>>
>> I'm confused. Not to put words in Kees' mouth; /I/ am confused (which
>> admittedly could well be just because I make no claims to be a
>> coder and am simply reading the bug and thread, but I'd appreciate some
>> "unconfusing" anyway).
>>
>> My interpretation of the "async disabling" reference was that it was to
>> comment #30 on the bug:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=207383#c30
>>
>> ... which (if I'm not confused on this point too) appears to be yours.
>> There it was stated...
>>
>> I've also found that this bug exclusively occurs when commit_work is on
>> the workqueue. After forcing drm_atomic_helper_commit to run all of the
>> commits without adding to the workqueue and running the OS, the issue
>> seems to have disappeared.
>> <<<<
>>
>> Would not forcing all commits to run directly, without placing them on
>> the workqueue, be "async disabling"? That's what I /thought/ he was
>> referencing.
> 
> Oh, I thought he was referring to a different patch. Kees, could I get
> your confirmation on this?
> 
> The change I made actually affected all of the DRM code, although this could
> easily be changed to be specific to amdgpu. (By forcing blocking on
> amdgpu_dm's non-blocking commit code)
> 
> That said, I'd still need to test further because I only did test it for a
> couple of hours then. Although it should work in theory.
> 
>> OTOH your base/context swap idea sounds like a possibly "less
>> disturbance" workaround, if it works, and given the point in the
>> commit cycle... (But if it's out Sunday it's likely too late to test
>> and get it in now anyway; if it's another week, tho...)
> 
> The base/context swap idea should make the use-after-free behave how it
> did in 5.6. Since the bug doesn't cause an issue in 5.6, it's less of a
> "less disturbance" workaround and more of a "no disturbance" workaround.

Sorry for bothering, but is there now a solution, besides reverting the 
commits, to avoid freezes/crashes *without* performance regressions?


Kind regards,

Paul


More information about the amd-gfx mailing list