[PATCH] drm/amd/display: avoid sleeping in atomic context while creating new context or state
Harry Wentland
harry.wentland at amd.com
Thu Jun 7 14:46:59 UTC 2018
On 2018-06-07 10:44 AM, Michel Dänzer wrote:
> On 2018-05-31 08:05 PM, Christian König wrote:
>> Am 30.05.2018 um 18:03 schrieb Harry Wentland:
>>> On 2018-05-30 06:17 AM, Shirish S wrote:
>>>> This patch fixes the warning messages that are caused due to calling
>>>> sleep in atomic context as below:
>>>>
>>>> BUG: sleeping function called from invalid context at mm/slab.h:419
>>>> in_atomic(): 1, irqs_disabled(): 1, pid: 5, name: kworker/u4:0
>>>> CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G W 4.14.35 #941
>>>> Workqueue: events_unbound commit_work
>>>> Call Trace:
>>>> dump_stack+0x4d/0x63
>>>> ___might_sleep+0x11f/0x12e
>>>> kmem_cache_alloc_trace+0x41/0xea
>>>> dc_create_state+0x1f/0x30
>>>> dc_commit_updates_for_stream+0x73/0x4cf
>>>> ? amdgpu_get_crtc_scanoutpos+0x82/0x16b
>>>> amdgpu_dm_do_flip+0x239/0x298
>>>> amdgpu_dm_commit_planes.isra.23+0x379/0x54b
>>>> ? dc_commit_state+0x3da/0x404
>>>> amdgpu_dm_atomic_commit_tail+0x4fc/0x5d2
>>>> ? wait_for_common+0x5b/0x69
>>>> commit_tail+0x42/0x64
>>>> process_one_work+0x1b0/0x314
>>>> worker_thread+0x1cb/0x2c1
>>>> ? create_worker+0x1da/0x1da
>>>> kthread+0x156/0x15e
>>>> ? kthread_flush_work+0xea/0xea
>>>> ret_from_fork+0x22/0x40
>>>>
>>>> Signed-off-by: Shirish S <shirish.s at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
>>>> b/drivers/gpu/drm/amd/display/dc/core/dc.c
>>>> index 33149ed..d62206f 100644
>>>> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
>>>> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
>>>> @@ -588,7 +588,7 @@ static void disable_dangling_plane(struct dc *dc,
>>>> struct dc_state *context)
>>>> struct dc *dc_create(const struct dc_init_data *init_params)
>>>> {
>>>> - struct dc *dc = kzalloc(sizeof(*dc), GFP_KERNEL);
>>>> + struct dc *dc = kzalloc(sizeof(*dc), GFP_ATOMIC);
>>> Are you sure this one can be called in atomic_context?
>>>
>>> If so then everything in consstruct() would also need GFP_ATOMIC.
>>
>> Well the backtrace is quite obvious, but I agree that change still looks
>> fishy to me as well.
>>
>> Using GFP_ATOMIC should only be a last resort when nothing else helps,
>> but here it looks more like we misuse a spinlock where a mutex or
>> semaphore would be more appropriate.
>
> I want to raise this concern again, for all of these patches.
>
> I'm now seeing similar BUG messages triggered from VCE functions, see an
> example below. I've never seen these before today, so I assume they're
> caused by the "drm/amdgpu/pp: replace mutex with spin_lock (V3)" change.
>
> Now if we just mechanically convert the mutex to a spinlock whenever
> this happens, it could be a rat's tail, and we might end up converting a
> lot of mutexes to spinlocks. This could be bad if any of those locks can
> be held for significant amounts of time.
>
> Instead, we should look into why we end up in atomic context. All of the
> messages in these patches have been triggered from either a worker
> thread or an ioctl, neither of which run in atomic context per se. Most
> likely we end up in atomic context because a spinlock is held, so maybe
> it can be solved by converting those spinlocks to mutexes instead?
>
I fully agree with Michel.
Shirish, please follow up on this.
Harry
>
> [ 6232.096387] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
> [ 6232.099544] in_atomic(): 1, irqs_disabled(): 0, pid: 14785, name: kworker/3:14
> [ 6232.102183] INFO: lockdep is turned off.
> [ 6232.104965] CPU: 3 PID: 14785 Comm: kworker/3:14 Tainted: G B D W OE 4.16.0-rc7+ #104
> [ 6232.107835] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
> [ 6232.110878] Workqueue: events amdgpu_vce_idle_work_handler [amdgpu]
> [ 6232.113457] Call Trace:
> [ 6232.116065] dump_stack+0x85/0xc1
> [ 6232.117880] ___might_sleep+0x28a/0x3c0
> [ 6232.119770] __mutex_lock+0xc4/0x1520
> [ 6232.121756] ? vce_v3_0_stop+0x3a/0x170 [amdgpu]
> [ 6232.123603] ? mutex_lock_io_nested+0x13e0/0x13e0
> [ 6232.125447] ? debug_check_no_locks_freed+0x2c0/0x2c0
> [ 6232.127333] ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
> [ 6232.129253] ? __mutex_lock+0xf9/0x1520
> [ 6232.131251] ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
> [ 6232.133324] ? pick_next_task_fair+0x550/0x1720
> [ 6232.135178] ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
> [ 6232.137078] ? vce_v3_0_stop+0x3a/0x170 [amdgpu]
> [ 6232.138948] vce_v3_0_stop+0x3a/0x170 [amdgpu]
> [ 6232.140687] amdgpu_device_ip_set_powergating_state+0x150/0x2f0 [amdgpu]
> [ 6232.142694] smu7_powergate_vce+0x11d/0x1c0 [amdgpu]
> [ 6232.144336] pp_dpm_powergate_vce+0xf4/0x160 [amdgpu]
> [ 6232.146283] ? pp_set_clockgating_by_smu+0xe0/0xe0 [amdgpu]
> [ 6232.148007] amdgpu_dpm_enable_vce+0x298/0x330 [amdgpu]
> [ 6232.149815] process_one_work+0x715/0x1570
> [ 6232.151547] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
> [ 6232.153193] ? lock_acquire+0x10b/0x350
> [ 6232.155001] worker_thread+0xd4/0xef0
> [ 6232.156758] ? process_one_work+0x1570/0x1570
> [ 6232.158595] kthread+0x311/0x3d0
> [ 6232.160049] ? kthread_create_worker_on_cpu+0xc0/0xc0
> [ 6232.161677] ret_from_fork+0x27/0x50
>
>
>
More information about the amd-gfx
mailing list