[Intel-gfx] Oops with i915

Mon Jun 18 12:34:38 UTC 2018

On Mon, Jun 18, 2018 at 01:29:02PM +0100, Sudip Mukherjee wrote:
> Hi Ville,
> 
> On Mon, Jun 18, 2018 at 03:09:15PM +0300, Ville Syrjälä wrote:
> > On Thu, Jun 07, 2018 at 11:06:33AM +0100, Sudip Mukherjee wrote:
> > > Hi All,
> > > 
> > > We are running v4.14.47 kernel and recently in one of our test cycle
> > > we saw the below trace. I know this is not the usual way to raise a
> > > BUG report, but since this was seen only once in one of the automated
> > > test cycle so I donot have anything else apart from this trace.
> > > Is this a known issue? Will appreciate any help in understanding what
> > > the problem might be.
> > > 
> > > [ 1176.909543] BUG: unable to handle kernel paging request at 8298fb0a
> > > [ 1176.916565] IP: queued_spin_lock_slowpath+0xfc/0x142
> > > [ 1176.922111] *pdpt = 000000003367a001 *pde = 0000000000000000
> > > [ 1176.928534] Oops: 0002 [#1] PREEMPT SMP
> > > [ 1177.002434] CPU: 2 PID: 24688 Comm: kworker/u8:4 Tainted: G     U     O    4.14.47-20180606-a6b8390e8cc1de032b8314d1a5b193fe9e21f325 #1
> > > [ 1177.024120] Workqueue: events_unbound intel_atomic_commit_work
> > > [ 1177.030630] task: ef2ee200 task.stack: efbf4000
> > > [ 1177.035685] EIP: queued_spin_lock_slowpath+0xfc/0x142
> > > [ 1177.041327] EFLAGS: 00010087 CPU: 2
> > > [ 1177.045212] EAX: 8298fb0a EBX: 00003ba0 ECX: ee82489c EDX: f4656fc0
> > > [ 1177.052215] ESI: 000c0000 EDI: 00000001 EBP: efbf5e88 ESP: efbf5e78
> > > [ 1177.059217]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > > [ 1177.065239] CR0: 80050033 CR2: 8298fb0a CR3: 2e8ed320 CR4: 001006f0
> > > [ 1177.072240] Call Trace:
> > > [ 1177.074973]  _raw_spin_lock_irqsave+0x28/0x2d
> > > [ 1177.079840]  complete_all+0x12/0x36
> > > [ 1177.083737]  drm_atomic_helper_commit_hw_done+0x3c/0x43
> > > [ 1177.089576]  intel_atomic_commit_tail+0xa5f/0xbd9
> > > [ 1177.094832]  ? wait_woken+0x5a/0x5a
> > > [ 1177.098727]  ? wait_woken+0x5a/0x5a
> > > [ 1177.102622]  intel_atomic_commit_work+0xb/0xd
> > > [ 1177.107489]  ? intel_atomic_commit_work+0xb/0xd
> > > [ 1177.112551]  process_one_work+0x109/0x1ee
> > > [ 1177.117029]  worker_thread+0x1a4/0x257
> > > [ 1177.121215]  kthread+0xee/0xf3
> > > [ 1177.124625]  ? rescuer_thread+0x207/0x207
> > > [ 1177.129103]  ? kthread_create_on_node+0x1a/0x1a
> > > [ 1177.134165]  ret_from_fork+0x2e/0x38
> > > [ 1177.138156] Code: 12 09 de 89 f0 89 75 f0 c1 e8 10 66 87 41 02 89 c3 c1 e3 10 74 51 83 e0 03 c1 eb 12 6b c0 0c 05 c0 1f 7e c1 03 04 9d d8 b1 6c c1 <89> 10 8b 42 04 85 c0 75 04 f3 90 eb f5 8b 1a 85 db 74 03 0f 0d
> > > [ 1177.159204] EIP: queued_spin_lock_slowpath+0xfc/0x142 SS:ESP: 0068:efbf5e78
> > > [ 1177.166983] CR2: 000000008298fb0a
> > 
> > Presumably a use after free in atomic. Possibly 21a01abbe32a
> > ("drm/atomic: Fix freeing connector/plane state too early by tracking
> > commits, v3.") But there may have been other similar fixes.
> 
> Thanks for your reply. I also thought so as the stacktrace showed it was
> using an invalid memory for the old_state. And so I applied:
> 21a01abbe32a ("drm/atomic: Fix freeing connector/plane state too early by tracking commits, v3.")
> on top of v4.14.47. It also needed:
> 1) f46640b931e5 ("drm/atomic: Return commit in drm_crtc_commit_get for better annotation")
> 2) 163bcc2c74a2 ("drm/atomic: Move drm_crtc_commit to drm_crtc_state, v4.")
> 
> to apply cleanly. But after that the occurance rate increased.
> Did I miss something else also?

No idea. I suggest a reverse bisect to find out when it got fixed in
upstream.

-- 
Ville Syrjälä
Intel