[PATCH] drm/amd/display: avoid sleeping in atomic context while creating new context or state

Thu Jun 7 14:44:25 UTC 2018

On 2018-05-31 08:05 PM, Christian König wrote:
> Am 30.05.2018 um 18:03 schrieb Harry Wentland:
>> On 2018-05-30 06:17 AM, Shirish S wrote:
>>> This patch fixes the warning messages that are caused due to calling
>>> sleep in atomic context as below:
>>>
>>> BUG: sleeping function called from invalid context at mm/slab.h:419
>>> in_atomic(): 1, irqs_disabled(): 1, pid: 5, name: kworker/u4:0
>>> CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G        W       4.14.35 #941
>>> Workqueue: events_unbound commit_work
>>> Call Trace:
>>>   dump_stack+0x4d/0x63
>>>   ___might_sleep+0x11f/0x12e
>>>   kmem_cache_alloc_trace+0x41/0xea
>>>   dc_create_state+0x1f/0x30
>>>   dc_commit_updates_for_stream+0x73/0x4cf
>>>   ? amdgpu_get_crtc_scanoutpos+0x82/0x16b
>>>   amdgpu_dm_do_flip+0x239/0x298
>>>   amdgpu_dm_commit_planes.isra.23+0x379/0x54b
>>>   ? dc_commit_state+0x3da/0x404
>>>   amdgpu_dm_atomic_commit_tail+0x4fc/0x5d2
>>>   ? wait_for_common+0x5b/0x69
>>>   commit_tail+0x42/0x64
>>>   process_one_work+0x1b0/0x314
>>>   worker_thread+0x1cb/0x2c1
>>>   ? create_worker+0x1da/0x1da
>>>   kthread+0x156/0x15e
>>>   ? kthread_flush_work+0xea/0xea
>>>   ret_from_fork+0x22/0x40
>>>
>>> Signed-off-by: Shirish S <shirish.s at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
>>> b/drivers/gpu/drm/amd/display/dc/core/dc.c
>>> index 33149ed..d62206f 100644
>>> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
>>> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
>>> @@ -588,7 +588,7 @@ static void disable_dangling_plane(struct dc *dc,
>>> struct dc_state *context)
>>>     struct dc *dc_create(const struct dc_init_data *init_params)
>>>    {
>>> -    struct dc *dc = kzalloc(sizeof(*dc), GFP_KERNEL);
>>> +    struct dc *dc = kzalloc(sizeof(*dc), GFP_ATOMIC);
>> Are you sure this one can be called in atomic_context?
>>
>> If so then everything in consstruct() would also need GFP_ATOMIC.
> 
> Well the backtrace is quite obvious, but I agree that change still looks
> fishy to me as well.
> 
> Using GFP_ATOMIC should only be a last resort when nothing else helps,
> but here it looks more like we misuse a spinlock where a mutex or
> semaphore would be more appropriate.

I want to raise this concern again, for all of these patches.

I'm now seeing similar BUG messages triggered from VCE functions, see an
example below. I've never seen these before today, so I assume they're
caused by the "drm/amdgpu/pp: replace mutex with spin_lock (V3)" change.

Now if we just mechanically convert the mutex to a spinlock whenever
this happens, it could be a rat's tail, and we might end up converting a
lot of mutexes to spinlocks. This could be bad if any of those locks can
be held for significant amounts of time.

Instead, we should look into why we end up in atomic context. All of the
messages in these patches have been triggered from either a worker
thread or an ioctl, neither of which run in atomic context per se. Most
likely we end up in atomic context because a spinlock is held, so maybe
it can be solved by converting those spinlocks to mutexes instead?

[ 6232.096387] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
[ 6232.099544] in_atomic(): 1, irqs_disabled(): 0, pid: 14785, name: kworker/3:14
[ 6232.102183] INFO: lockdep is turned off.
[ 6232.104965] CPU: 3 PID: 14785 Comm: kworker/3:14 Tainted: G    B D W  OE    4.16.0-rc7+ #104
[ 6232.107835] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
[ 6232.110878] Workqueue: events amdgpu_vce_idle_work_handler [amdgpu]
[ 6232.113457] Call Trace:
[ 6232.116065]  dump_stack+0x85/0xc1
[ 6232.117880]  ___might_sleep+0x28a/0x3c0
[ 6232.119770]  __mutex_lock+0xc4/0x1520
[ 6232.121756]  ? vce_v3_0_stop+0x3a/0x170 [amdgpu]
[ 6232.123603]  ? mutex_lock_io_nested+0x13e0/0x13e0
[ 6232.125447]  ? debug_check_no_locks_freed+0x2c0/0x2c0
[ 6232.127333]  ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
[ 6232.129253]  ? __mutex_lock+0xf9/0x1520
[ 6232.131251]  ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
[ 6232.133324]  ? pick_next_task_fair+0x550/0x1720
[ 6232.135178]  ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
[ 6232.137078]  ? vce_v3_0_stop+0x3a/0x170 [amdgpu]
[ 6232.138948]  vce_v3_0_stop+0x3a/0x170 [amdgpu]
[ 6232.140687]  amdgpu_device_ip_set_powergating_state+0x150/0x2f0 [amdgpu]
[ 6232.142694]  smu7_powergate_vce+0x11d/0x1c0 [amdgpu]
[ 6232.144336]  pp_dpm_powergate_vce+0xf4/0x160 [amdgpu]
[ 6232.146283]  ? pp_set_clockgating_by_smu+0xe0/0xe0 [amdgpu]
[ 6232.148007]  amdgpu_dpm_enable_vce+0x298/0x330 [amdgpu]
[ 6232.149815]  process_one_work+0x715/0x1570
[ 6232.151547]  ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[ 6232.153193]  ? lock_acquire+0x10b/0x350 
[ 6232.155001]  worker_thread+0xd4/0xef0 
[ 6232.156758]  ? process_one_work+0x1570/0x1570
[ 6232.158595]  kthread+0x311/0x3d0
[ 6232.160049]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 6232.161677]  ret_from_fork+0x27/0x50

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer