[Bug 104975] Delay in skl_disable_plane() causes a system freeze

Wed May 2 17:42:42 UTC 2018

https://bugs.freedesktop.org/show_bug.cgi?id=104975

--- Comment #70 from Azhar <azhar.shaikh at intel.com> ---
(In reply to Azhar from comment #69)
> (In reply to Ville Syrjala from comment #68)
> > (In reply to Azhar from comment #67)
> > > (In reply to Azhar from comment #66)
> > > > (In reply to Ville Syrjala from comment #65)
> > > > > Hmm. I wonder if we're still barking up the right tree with the DDB angle.
> > > > > 
> > > > > I pushed the following hack:
> > > > > b48dad3fe322 ("hacks: fixed ddb allocation for each plane")
> > > > > to git://github.com/vsyrjala/linux.git double_buffer_ctl_ddb_wa_hacks
> > > > > 
> > > > > Would be nice to know whether that fully eliminates the hang. If it doesn't
> > > > > then we would seem to be on the wrong track.
> > > > 
> > > > The test has been running for more than 5 hours now, with ONLY the top
> > > > commit "b48dad3fe322 ("hacks: fixed ddb allocation for each plane")"  on
> > > > linux-stable 4.15.18 branch without IPC support on BXT+.
> > > > 
> > > > I will keep it running overnight.
> > > 
> > > There was no crash with overnight run for above configuration.
> > 
> > OK. I guess we're still on the right track then.
> > 
> > Let's try to make this double buffer thing as simple as possible:
> > git://github.com/vsyrjala/linux.git double_buffer_ctl_simple
> > 
> > That approach is no good when multiple pipes are enabled, but for a single
> > pipe case it's about as simple as we can make it.
> 
> In the above branch commit("grab double buffer ctl around the entire
> update") has a schedule_timeout call within a spinlock which is causing a
> kernel panic due to BUG.
> 
> [  204.671649] BUG: scheduling while atomic: kworker/u8:1/78/0x00000002
> [  204.678769] Modules linked in: rfcomm cmac uinput cfg80211
> ip6table_filter xt_nat bridge stp llc ipt_MASQUERADE nf_nat_masquerade_ipv4
> btusb btrtl lzo btbcm btintel lzo_compress ov13858 bluetooth iptable_nat
> zram nf_nat_ipv4 nf_nat ov5670 v4l2_fwnode ecdh_generic snd_soc_max98927
> dw9714 at24 acpi_als xt_mark fuse snd_seq_dummy snd_seq snd_seq_device
> iio_trig_sysfs cros_ec_light_prox cros_ec_sensors cros_ec_sensors_core
> industrialio_triggered_buffer kfifo_buf industrialio r8152 mii joydev
> [  204.727322] CPU: 0 PID: 78 Comm: kworker/u8:1 Tainted: G        W       
> 4.15.18-00010-gc4c90323f9ec #49
> [  204.746677] Workqueue: events_unbound intel_atomic_commit_work
> [  204.753196] Call Trace:
> [  204.755943]  dump_stack+0x4d/0x63
> [  204.759650]  __schedule_bug+0x5d/0x6b
> [  204.763740]  __schedule+0x83/0x7b2
> [  204.767542]  ? __switch_to_asm+0x40/0x70
> [  204.771920]  ? __switch_to_asm+0x34/0x70
> [  204.776297]  ? __switch_to_asm+0x34/0x70
> [  204.780665]  ? __switch_to_asm+0x40/0x70
> [  204.785055]  schedule+0x75/0x86
> [  204.788566]  schedule_timeout+0x2de/0x34b
> [  204.793053]  ? collect_expired_timers+0x10b/0x10b
> [  204.798312]  intel_pipe_update_start+0x1ea/0x2e2
> [  204.803473]  ? add_wait_queue+0x48/0x48
> [  204.807761]  intel_begin_crtc_commit+0x6b/0x20d
> [  204.812836]  drm_atomic_helper_commit_planes_on_crtc+0x4e/0x14f
> [  204.819462]  skl_update_crtcs+0x172/0x1b7
> [  204.823944]  intel_atomic_commit_tail+0x6dd/0x1d15
> [  204.829299]  ? _raw_spin_unlock_irq+0xe/0x21
> [  204.834085]  ? __schedule+0x583/0x7b2
> [  204.838180]  worker_thread+0x42e/0x5da
> [  204.842362]  ? queue_work_on+0x24/0x24
> [  204.846552]  kthread+0x1e6/0x1ee
> [  204.850153]  ? queue_work_on+0x24/0x24
> [  204.854336]  ? kthread_create_worker+0x72/0x72
> [  204.859297]  ret_from_fork+0x35/0x40
> [  204.863441] BUG: sleeping function called from invalid context at
> /mnt/host/source/src/third_party/kernel/v4.4/kernel/sched/completion.c:102
> [  204.877553] in_atomic(): 1, irqs_disabled(): 0, pid: 78, name:
> kworker/u8:1
> [  204.885348] CPU: 1 PID: 78 Comm: kworker/u8:1 Tainted: G        W       
> 4.15.18-00010-gc4c90323f9ec #49
> [  204.904727] Workqueue: events_unbound intel_atomic_commit_work
> [  204.911252] Call Trace:
> [  204.913990]  dump_stack+0x4d/0x63
> [  204.917696]  ___might_sleep+0x126/0x135
> [  204.921975]  wait_for_common+0x32/0x69
> [  204.926158]  drm_atomic_helper_wait_for_flip_done+0x50/0x7e
> [  204.932391]  intel_atomic_commit_tail+0x6ea/0x1d15
> [  204.937745]  ? _raw_spin_unlock_irq+0xe/0x21
> [  204.942513]  ? __schedule+0x583/0x7b2
> [  204.946610]  worker_thread+0x42e/0x5da
> [  204.950793]  ? queue_work_on+0x24/0x24
> [  204.954976]  kthread+0x1e6/0x1ee
> [  204.958578]  ? queue_work_on+0x24/0x24
> [  204.962764]  ? kthread_create_worker+0x72/0x72
> [  204.967725]  ret_from_fork+0x35/0x40
> [  204.971759] BUG: workqueue leaked lock or atomic:
> kworker/u8:1/0x7fffffff/78
> [  204.971759]      last function: intel_atomic_commit_work
> [  204.971775] BUG: scheduling while atomic: kworker/u8:3/713/0x00000002
> 
> For now I just commented and ran the test with the same patch and it still
> crashed.

Ville, commenting the schedule_timeout here,
git://github.com/vsyrjala/linux.git double_buffer_ctl_simple, still does crash
the system. Is commenting the timeout, right thing here? If not, then will have
to remove the spinlocks. Do you have any new patch/es?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180502/f29faa30/attachment.html>