[PATCH] drm: Get ref on CRTC commit object when waiting for flip_done
Leo Li
sunpeng.li at amd.com
Fri Oct 12 21:28:04 UTC 2018
On 2018-10-12 03:34 AM, Daniel Vetter wrote:
> On Thu, Oct 11, 2018 at 08:27:43PM -0400, sunpeng.li at amd.com wrote:
>> From: Leo Li <sunpeng.li at amd.com>
>>
>> This fixes a general protection fault, caused by accessing the contents
>> of a flip_done completion object that has already been freed. It occurs
>> due to the preemption of a non-blocking commit worker thread W by
>> another commit thread X. X continues to clear its atomic state at the
>> end, destroying the CRTC commit object that W still needs. Switching
>> back to W and accessing the commit objects then leads to bad results.
>>
>> Worker W becomes preemptable when waiting for flip_done to complete. At
>> this point, a frequently occurring commit thread X can take over. Here's
>> an example where W is a worker thread that flips on both CRTCs, and X
>> does a legacy cursor update on both CRTCs:
>>
>> ...
>> 1. W does flip work
>> 2. W runs commit_hw_done()
>> 3. W waits for flip_done on CRTC 1
>> 4. > flip_done for CRTC 1 completes
>> 5. W finishes waiting for CRTC 1
>> 6. W waits for flip_done on CRTC 2
>>
>> 7. > Preempted by X
>> 8. > flip_done for CRTC 2 completes
>> 9. X atomic_check: hw_done and flip_done are complete on all CRTCs
>> 10. X updates cursor on both CRTCs
>> 11. X destroys atomic state
>> 12. X done
>>
>> 13. > Switch back to W
>> 14. W waits for flip_done on CRTC 2
>> 15. W raises general protection fault
>>
>> The error looks like so:
>>
>> general protection fault: 0000 [#1] PREEMPT SMP PTI
>> **snip**
>> Call Trace:
>> lock_acquire+0xa2/0x1b0
>> _raw_spin_lock_irq+0x39/0x70
>> wait_for_completion_timeout+0x31/0x130
>> drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
>> amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
>> commit_tail+0x3d/0x70 [drm_kms_helper]
>> process_one_work+0x212/0x650
>> worker_thread+0x49/0x420
>> kthread+0xfb/0x130
>> ret_from_fork+0x3a/0x50
>> Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
>> gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
>> fb_sys_fops ttm(O) drm(O)
>>
>> Note that i915 has this issue masked, since hw_done is signaled after
>> waiting for flip_done. Doing so will block the cursor update from
>> happening until hw_done is signaled, preventing the cursor commit from
>> destroying the state.
>>
>> Signed-off-by: Leo Li <sunpeng.li at amd.com>
>
> Great analysis!
>
> Bugfix itself needs a bit more work though, see below.
>
>> ---
>> drivers/gpu/drm/drm_atomic_helper.c | 30 ++++++++++++++++++++++++++----
>> 1 file changed, 26 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>> index 80be74d..efdf043 100644
>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>> @@ -1410,20 +1410,42 @@ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev,
>> {
>> struct drm_crtc_state *new_crtc_state;
>> struct drm_crtc *crtc;
>> + struct drm_crtc_commit **commits;
>> int i;
>> + int num_crtc = dev->mode_config.num_crtc;
>>
>> + commits = kcalloc(num_crtc, sizeof(*commits), GFP_KERNEL);
>> +
>> + /*
>> + * Because we open ourselves to preemption by calling
>> + * wait_for_completion_timeout(), we need to get our own references to
>> + * the commit objects. This is so that a preempting commit won't free
>> + * them.
>> + */
>
> The scheduler can preempt you at any point before here too, if you enable
> CONFIG_PREEMPT. It definitely can preempt/block in the kcalloc above, even
> if you have CONFIG_PREEMPT disabled. The scheduling point you identified
> below is simply the most likely.
>
Ah, didn't think of this.
> The underlying bug is that after drm_atomic_helper_commit_hw_done() we
> unblock the next commit worker (thread X in your example), and cannot
> assume anymore that the new state will stay around (since new state is
> released by the subsequent commit work run in thread X).
>
> Therefore your drm_crtc_commit_get here already can access freed memory.
> The correct fix here is to store the temporary reference before we call
> drm_atomic_helper_commit_hw_done(), and then use that in this function
> here. The problem is that this is somewhat tricky to pull off, since some
> drivers call wait_for_flip_done() before hw_done() (like i915).
>
> Here's what I'd do:
> - Add a new drm_crtc_commit pointer to struct __drm_crtcs_state.
>
> - Fill that out in drm_atomic_helper_setup_commit, and make sure you
> release the reference for it in drm_atomic_state_default_clear().
>
> - Use that pointer instead in drm_atomic_helper_wait_for_flip_done() here.
Thanks for the pointers, patch incoming :)
Leo
>
> Cheers, Daniel
>> for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) {
>> - struct drm_crtc_commit *commit = new_crtc_state->commit;
>> + commits[i] = new_crtc_state->commit;
>> + if (commits[i])
>> + drm_crtc_commit_get(commits[i]);
>> + }
>> +
>> + for (i = 0; i < num_crtc; i++) {
>> int ret;
>>
>> - if (!commit)
>> + if (!commits[i])
>> continue;
>>
>> - ret = wait_for_completion_timeout(&commit->flip_done, 10 * HZ);
>> + ret = wait_for_completion_timeout(&commits[i]->flip_done,
>> + 10 * HZ);
>> if (ret == 0)
>> DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n",
>> - crtc->base.id, crtc->name);
>> + commits[i]->crtc->base.id,
>> + commits[i]->crtc->name);
>> }
>> +
>> + for (i = 0; i < num_crtc; i++)
>> + if (commits[i])
>> + drm_crtc_commit_put(commits[i]);
>> + kfree(commits);
>> }
>> EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done);
>>
>> --
>> 2.7.4
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
More information about the dri-devel
mailing list