[PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free
Christian König
christian.koenig at amd.com
Fri Jul 24 07:26:32 UTC 2020
Am 24.07.20 um 00:58 schrieb Mazin Rezk:
> On Thursday, July 23, 2020 6:32 PM, Kees Cook <keescook at chromium.org> wrote:
>
>> On Thu, Jul 23, 2020 at 09:10:15PM +0000, Mazin Rezk wrote:
>>
>>> When amdgpu_dm_atomic_commit_tail is running in the workqueue,
>>> drm_atomic_state_put will get called while amdgpu_dm_atomic_commit_tail is
>>> running, causing a race condition where state (and then dm_state) is
>>> sometimes freed while amdgpu_dm_atomic_commit_tail is running. This bug has
>>> occurred since 5.7-rc1 and is well documented among polaris11 users [1].
>>> Prior to 5.7, this was not a noticeable issue since the freelist pointer
>>> was stored at the beginning of dm_state (base), which was unused. After
>>> changing the freelist pointer to be stored in the middle of the struct, the
>>> freelist pointer overwrote the context, causing dc_state to become garbage
>>> data and made the call to dm_enable_per_frame_crtc_master_sync dereference
>>> a freelist pointer.
>>> This patch fixes the aforementioned issue by calling drm_atomic_state_get
>>> in amdgpu_dm_atomic_commit before drm_atomic_helper_commit is called and
>>> drm_atomic_state_put after amdgpu_dm_atomic_commit_tail is complete.
>>> According to my testing on 5.8.0-rc6, this should fix bug 207383 on
>>> Bugzilla [1].
>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D207383&data=02%7C01%7Charry.wentland%40amd.com%7C53cc9cffb1d244d7b43508d82f5bed1b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637311419153032496&sdata=t45vmEJ80UXOmRfndGfe69AOedtkFUwDqvWgGDrSuOk%3D&reserved=0
>> Nice work tracking this down!
>>
>>> Fixes: 3202fa62f ("slub: relocate freelist pointer to middle of object")
>> I do, however, object to this Fixes tag. :) The flaw appears to have
>> been with amdgpu_dm's reference tracking of "state" in the nonblocking
>> case. (How this reference counting is supposed to work correctly, though,
>> I'm not sure.) If I look at where the drm helper was split from being
>> the default callback, it looks like this was what introduced the bug:
>>
>> da5c47f682ab ("drm/amd/display: Remove acrtc->stream")
>>
>> ? 3202fa62f certainly exposed it much more quickly, but there was a race
>> even without 3202fa62f where something could have realloced the memory
>> and written over it.
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Kees Cook
>
> Thanks, I'll be sure to avoid using 3202fa62f as the cause next time.
> I just thought to do that because it was what made the use-after-free cause
> a noticeable bug.
>
> Also, by the way, I just realised the patch didn't completely solve the bug.
> Sorry about that, making an LKML thread on this was hasty on my part. Should
> I get further confirmation from the Bugzilla thread before submitting a patch
> for this bug in the future?
Submitting stuff as early as possible is mostly a good idea. Just if the
code is utterly broken or completely unreadable you should probably
expect a harsh response :)
Maybe ask for more testing in the commit message if you are not 100%
sure if that really fixes a bug or not.
Regards,
Christian.
>
> Thanks,
> Mazin Rezk
More information about the amd-gfx
mailing list