[Bug 111699] [CI][BAT][iommu]igt at gem_exec_suspend@basic-s4-devices - fail - DMAR write fault 7 + Failed assertion: !"GPU hung"

Tue Sep 17 10:02:41 UTC 2019

https://bugs.freedesktop.org/show_bug.cgi?id=111699

--- Comment #7 from Chris Wilson <chris at chris-wilson.co.uk> ---
<3> [209.832569] DMAR: DRHD: handling fault status reg 2
<3> [209.832629] DMAR: [DMA Write] Request device [00:02.0] fault addr 41000
[fault reason 07] Next page table ptr is invalid
<7> [212.694638] [drm:edp_panel_vdd_off_sync [i915]] Turning [ENCODER:214:DDI
A] VDD off
<7> [212.694912] [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x80000008
PP_CONTROL: 0x00000067
<7> [212.694994] [drm:intel_power_well_disable [i915]] disabling DC off
<7> [212.695078] [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [212.695165] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
<7> [217.942713] hangcheck bcs0
<7> [217.942719] hangcheck      Awake? 2
<7> [217.942723] hangcheck      Hangcheck: 6016 ms ago
<7> [217.942727] hangcheck      Reset count: 0 (global 740)
<7> [217.942730] hangcheck      Requests:
<7> [217.942743] hangcheck              active  617:1e0*-  prio=3 @ 8110ms:
gem_exec_suspen[2283]
<7> [217.942747] hangcheck              ring->start:  0x007f6000
<7> [217.942750] hangcheck              ring->head:   0x00002ae0
<7> [217.942754] hangcheck              ring->tail:   0x00002b78
<7> [217.942757] hangcheck              ring->emit:   0x000037c0
<7> [217.942760] hangcheck              ring->space:  0x000032e0
<7> [217.942763] hangcheck              ring->hwsp:   0xffffa180
<7> [217.942767] hangcheck [head 2b10, postfix 2b50, tail 2b80, batch
0x00000000_00040000]:
<7> [217.942789] hangcheck [0000] 13244002 00000204 00000000 00000000 02800000
00000000 10400002 ffffa180
<7> [217.942794] hangcheck [0020] 00000000 000001df 04000001 18800101 00040000
00000000 04000000 00000000
<7> [217.942798] hangcheck [0040] 13004002 ffffa184 00000000 000001e0 01000000
04000001 0e40c002 00000000
<7> [217.942802] hangcheck [0060] ffffd0c8 00000000 02800000 00000000
<7> [217.942812] hangcheck      MMIO base:  0x00022000
<7> [217.942824] hangcheck      RING_START: 0x007f6000
<7> [217.942830] hangcheck      RING_HEAD:  0x00002b48
<7> [217.942837] hangcheck      RING_TAIL:  0x00002b78
<7> [217.942846] hangcheck      RING_CTL:   0x00003001
<7> [217.942856] hangcheck      RING_MODE:  0x00000000
<7> [217.942863] hangcheck      RING_IMR: 00000000
<7> [217.942882] hangcheck      ACTHD:  0x00000000_00202b48
<7> [217.942895] hangcheck      BBADDR: 0x00000000_00040fd4
<7> [217.942908] hangcheck      DMA_FADDR: 0x00000000_007f8b78
<7> [217.942915] hangcheck      IPEIR: 0x00000000
<7> [217.942921] hangcheck      IPEHR: 0x05000000
<7> [217.942932] hangcheck      Execlist status: 0x00001098 60000020, entries
12
<7> [217.942936] hangcheck      Execlist CSB read 3, write 3, tasklet queued?
no (enabled)
<7> [217.942943] hangcheck              Active[0: ring:{start:007f6000,
hwsp:ffffa180, seqno:000001df}, rq:  617:1e0*-  prio=3 @ 8110ms:
gem_exec_suspen[2283]
<7> [217.942950] hangcheck              E  617:1e0*-  prio=3 @ 8110ms:
gem_exec_suspen[2283]
<7> [217.942953] hangcheck HWSP:
<7> [217.942958] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
<7> [217.942962] hangcheck *
<7> [217.942967] hangcheck [0040] 00000018 60000020 00000001 60000000 00000018
60000020 00000001 60000000
<7> [217.942970] hangcheck *
<7> [217.942974] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000003
<7> [217.942979] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
<7> [217.942982] hangcheck *
<7> [217.942989] hangcheck Idle? no

So it the fault addr of 0x41000 is matching the page after the batch (BBADDR:
0x40fd4). The write is puzzling. The BBADDR is close enough to the page
boundary for the 128-byte prefetch to cross into the next page, but it should
not be a write for the CS parser. And it should happily be a scratch page, or
the store buffer.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190917/efe57a55/attachment-0001.html>