[PATCH] drm/msm: fix splat when userspace is killed with pending atomic update

Rob Clark robdclark at gmail.com
Wed May 3 14:01:25 UTC 2017


On Tue, May 2, 2017 at 5:01 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Fri, Apr 28, 2017 at 8:05 PM, Rob Clark <robdclark at gmail.com> wrote:
>> The ->preclose() hook is a good place to block for pending atomic
>> updates.  We can't do this in ->postclose(), as it needs to happen
>> before drm_fb_release().  Otherwise, since we have already swapped
>> state (in the case of a non-blocking atomic update), this means that
>> the plane_state->fb will be released and cleared before we wait for
>> fences from the atomic-commit wq.
>>
>> There are probably more complex solutions possible.  But since already
>> scheduled atomic update, possibly blocking on already scheduled gpu/etc
>> fences, will complete eventually (assuming nothing catches fire), so
>> the sanest thing seems to be just block until already scheduled atomic
>> updates complete before tearing things down.
>>
>> Fixes:
>>
>>    WARNING: CPU: 1 PID: 69 at ../drivers/gpu/drm/drm_atomic_helper.c:1061 drm_atomic_helper_wait_for_fences+0xe0/0xf8
>>    Modules linked in:
>>
>>    CPU: 1 PID: 69 Comm: kworker/1:1 Tainted: G        W       4.11.0-rc8+ #1187
>>    Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>>    Workqueue: events drm_mode_rmfb_work_fn
>>    task: ffffffc036560d00 task.stack: ffffffc036550000
>>    PC is at drm_atomic_helper_wait_for_fences+0xe0/0xf8
>>    LR is at complete_commit.isra.1+0x44/0x1c0
>>    pc : [<ffffff80084f6040>] lr : [<ffffff800854176c>] pstate: 20000145
>>    sp : ffffffc036553b60
>>    x29: ffffffc036553b60 x28: ffffffc0264e6a00
>>    x27: ffffffc035659000 x26: 0000000000000000
>>    x25: ffffffc0240e8000 x24: 0000000000000038
>>    x23: 0000000000000000 x22: ffffff800858f200
>>    x21: ffffffc0240e8000 x20: ffffffc02f56a800
>>    x19: 0000000000000000 x18: 0000000000000000
>>    x17: 0000000000000000 x16: 0000000000000000
>>    x15: 0000000000000000 x14: ffffffc00a192700
>>    x13: 0000000000000004 x12: 0000000000000000
>>    x11: ffffff80089a1690 x10: 00000000000008f0
>>    x9 : ffffffc036553b20 x8 : ffffffc036561650
>>    x7 : ffffffc03fe6cb40 x6 : 0000000000000000
>>    x5 : 0000000000000001 x4 : 0000000000000002
>>    x3 : ffffffc035659000 x2 : ffffffc0240e8c80
>>    x1 : 0000000000000000 x0 : ffffffc02adbe588
>>
>>    ---[ end trace 13aeec77c3fb55e2 ]---
>>    Call trace:
>>    Exception stack(0xffffffc036553990 to 0xffffffc036553ac0)
>>    3980:                                   0000000000000000 0000008000000000
>>    39a0: ffffffc036553b60 ffffff80084f6040 0000000000004ff0 0000000000000038
>>    39c0: ffffffc0365539d0 ffffff800857e098 ffffffc036553a00 ffffff800857e1b0
>>    39e0: ffffffc036553a10 ffffff800857c554 ffffffc0365e8400 ffffffc0365e8400
>>    3a00: ffffffc036553a20 ffffff8008103358 000000000001aad7 ffffff800851b72c
>>    3a20: ffffffc036553a50 ffffff80080e9228 ffffffc02adbe588 0000000000000000
>>    3a40: ffffffc0240e8c80 ffffffc035659000 0000000000000002 0000000000000001
>>    3a60: 0000000000000000 ffffffc03fe6cb40 ffffffc036561650 ffffffc036553b20
>>    3a80: 00000000000008f0 ffffff80089a1690 0000000000000000 0000000000000004
>>    3aa0: ffffffc00a192700 0000000000000000 0000000000000000 0000000000000000
>>    [<ffffff80084f6040>] drm_atomic_helper_wait_for_fences+0xe0/0xf8
>>    [<ffffff800854176c>] complete_commit.isra.1+0x44/0x1c0
>>    [<ffffff8008541c64>] msm_atomic_commit+0x32c/0x350
>>    [<ffffff8008516230>] drm_atomic_commit+0x50/0x60
>>    [<ffffff8008517548>] drm_atomic_remove_fb+0x158/0x250
>>    [<ffffff80085186d0>] drm_framebuffer_remove+0x50/0x158
>>    [<ffffff8008518818>] drm_mode_rmfb_work_fn+0x40/0x58
>>    [<ffffff80080d5668>] process_one_work+0x1d0/0x378
>>    [<ffffff80080d5a54>] worker_thread+0x244/0x488
>>    [<ffffff80080db7fc>] kthread+0xfc/0x128
>>    [<ffffff8008082ec0>] ret_from_fork+0x10/0x50
>>
>> Reported-by: Stanimir Varbanov <stanimir.varbanov at linaro.org>
>> Signed-off-by: Rob Clark <robdclark at gmail.com>
>> ---
>> The hunk that removes the comment about ->preclose() included in this
>> patch to challenge the assumption that ->preclose() shouldn't exist ;-)
>
> And I'm going to challenge your patch here. Both fences and
> framebuffers and atomic commits are refcounted. If you go boom on them
> when userspace closes the fd, you have a refcount bug. We don't fix
> those by flusing stuff :-)

So, it isn't a refcount'ing but, but something much funnier..

It seems that mdp5 had custom plane state with it's own dup_state fxn,
pre-dating the addition of
__drm_atomic_helper_plane_duplicate_state(), and when the helper was
introduced it wasn't retrofitted.  Which was all good until the fence
ptr is added to base plane_state struct.  So this means that
plane_state->fence was getting copied over into the duplicated
plane_state.

So the atomic rmfb code would sometimes manage to copy the fence ptr
if there is another pending update which had already swapped state but
not yet committed.

BR,
-R

> Please add a pair of get/put() calls at the right place instead.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list