[PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free
Paul Menzel
pmenzel at molgen.mpg.de
Fri Jul 24 07:45:18 UTC 2020
Dear Kees,
Am 24.07.20 um 00:32 schrieb Kees Cook:
> On Thu, Jul 23, 2020 at 09:10:15PM +0000, Mazin Rezk wrote:
>> When amdgpu_dm_atomic_commit_tail is running in the workqueue,
>> drm_atomic_state_put will get called while amdgpu_dm_atomic_commit_tail is
>> running, causing a race condition where state (and then dm_state) is
>> sometimes freed while amdgpu_dm_atomic_commit_tail is running. This bug has
>> occurred since 5.7-rc1 and is well documented among polaris11 users [1].
>>
>> Prior to 5.7, this was not a noticeable issue since the freelist pointer
>> was stored at the beginning of dm_state (base), which was unused. After
>> changing the freelist pointer to be stored in the middle of the struct, the
>> freelist pointer overwrote the context, causing dc_state to become garbage
>> data and made the call to dm_enable_per_frame_crtc_master_sync dereference
>> a freelist pointer.
>>
>> This patch fixes the aforementioned issue by calling drm_atomic_state_get
>> in amdgpu_dm_atomic_commit before drm_atomic_helper_commit is called and
>> drm_atomic_state_put after amdgpu_dm_atomic_commit_tail is complete.
>>
>> According to my testing on 5.8.0-rc6, this should fix bug 207383 on
>> Bugzilla [1].
>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=207383
>
> Nice work tracking this down!
>
>> Fixes: 3202fa62f ("slub: relocate freelist pointer to middle of object")
>
> I do, however, object to this Fixes tag. :) The flaw appears to have
> been with amdgpu_dm's reference tracking of "state" in the nonblocking
> case. (How this reference counting is supposed to work correctly, though,
> I'm not sure.) If I look at where the drm helper was split from being
> the default callback, it looks like this was what introduced the bug:
>
> da5c47f682ab ("drm/amd/display: Remove acrtc->stream")
>
> ? 3202fa62f certainly exposed it much more quickly, but there was a race
> even without 3202fa62f where something could have realloced the memory
> and written over it.
I understand the Fixes tag mainly a help when backporting commits.
As Linux 5.8-rc7 is going to be released this Sunday, I wonder, if
commit 3202fa62f ("slub: relocate freelist pointer to middle of object")
should be reverted for now to fix the regression for the users according
to Linux’ no regression policy. Once the AMDGPU/DRM driver issue is
fixed, it can be reapplied. I know it’s not optimal, but as some testing
is going to be involved for the fix, I’d argue it’s the best option for
the users.
Kind regards,
Paul
More information about the amd-gfx
mailing list