[Bug 200139] New: amdgpu lockup after resume from sleep

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Tue Jun 19 13:05:24 UTC 2018


https://bugzilla.kernel.org/show_bug.cgi?id=200139

            Bug ID: 200139
           Summary: amdgpu lockup after resume from sleep
           Product: Drivers
           Version: 2.5
    Kernel Version: 4.17.2
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri at kernel-bugs.osdl.org
          Reporter: j.hoffmann at quapona.com
        Regression: No

Created attachment 276689
  --> https://bugzilla.kernel.org/attachment.cgi?id=276689&action=edit
HWInfo

I have observed a GPU lockup when the systems resumes after a sleep. The
duration of the sleep dosn't care. The problem occurs every time putting the
system to sleep.

I was able to narrow the problem a little bit. When I switch to the console and
then putting the system to sleep, the system will come up properly (with a
trace on a amgpu fuction). If I then switch back to the login manager or to the
desktop, the gpu fault and eventually hangs. See logs below.

I can reproduce the problem with kernel 4.16.13. Further it dosn't matter if
amdgpu.dc is enabled or disable.

System
----------
Linux 4.17.2
Debian Unstable
X.Org 1.20
Mesa 18.1.1
Radeon RX 580 Series (POLARIS10, DRM 3.25.0, 4.17.2, LLVM 6.0.0)
CPU Intel Core i7-8700k
MB Asus Prime z380-A


Kernel log after the resume from console:
-----------------------------------------
Jun 19 14:24:39 moc kernel: sd 0:0:0:0: [sda] Starting disk
Jun 19 14:24:39 moc kernel: [drm] PCIE GART of 256M enabled (table at
0x000000F400040000).
Jun 19 14:24:39 moc kernel: WARNING: CPU: 7 PID: 28047 at
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:725
amdgpu_dm_display_resume+0x213/0x220 [amdgpu]
Jun 19 14:24:39 moc kernel: Modules linked in: vmnet(OE)
vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) vmmon(OE) fuse(E) joydev(E)
hid_cherry(E) hid_generic(E) usbhid(E) hid(E) intel_rapl(E)
x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) ir
Jun 19 14:24:39 moc kernel:  asus_wmi(E) evdev(E) efi_pstore(E) intel_uncore(E)
sparse_keymap(E) wmi_bmof(E) mxm_wmi(E) i2c_algo_bit(E) rfkill(E) sg(E)
intel_rapl_perf(E) iTCO_wdt(E) efivars(E) snd(E) mei_me(E)
iTCO_vendor_support(E) soundcore(E) mei(E) shpchp(E) wmi(E) v
Jun 19 14:24:39 moc kernel:  btrfs(E) zstd_decompress(E) zstd_compress(E)
xxhash(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E)
async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E)
raid1(E) raid0(E) multipath(E) linear(E) md
Jun 19 14:24:39 moc kernel: CPU: 7 PID: 28047 Comm: kworker/u24:7 Tainted: G   
       OE     4.17.2 #1
Jun 19 14:24:39 moc kernel: Hardware name: System manufacturer System Product
Name/PRIME Z370-A, BIOS 0805 05/18/2018
Jun 19 14:24:39 moc kernel: Workqueue: events_unbound async_run_entry_fn
Jun 19 14:24:39 moc kernel: RIP: 0010:amdgpu_dm_display_resume+0x213/0x220
[amdgpu]
Jun 19 14:24:39 moc kernel: RSP: 0000:ffffaadd4447fd60 EFLAGS: 00010202
Jun 19 14:24:39 moc kernel: RAX: 0000000000000002 RBX: ffff96d7a48b0000 RCX:
0000000000000006
Jun 19 14:24:39 moc kernel: RDX: 0000000000000006 RSI: ffff96d6915a2c80 RDI:
ffff96d7898f7800
Jun 19 14:24:39 moc kernel: RBP: ffff96d79fb9d800 R08: 0000000000000000 R09:
ffffffffc14a7174
Jun 19 14:24:39 moc kernel: R10: ffffe4dea0a9a840 R11: 0000000000000001 R12:
0000000000000000
Jun 19 14:24:39 moc kernel: R13: ffff96d7a5e43800 R14: ffff96d7a9ca8d40 R15:
ffffffffb4695dbb
Jun 19 14:24:39 moc kernel: FS:  0000000000000000(0000)
GS:ffff96d7ae3c0000(0000) knlGS:0000000000000000
Jun 19 14:24:39 moc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 19 14:24:39 moc kernel: CR2: 0000000000000000 CR3: 00000003aa80a001 CR4:
00000000003606e0
Jun 19 14:24:39 moc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
Jun 19 14:24:39 moc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
Jun 19 14:24:39 moc kernel: Call Trace:
Jun 19 14:24:39 moc kernel:  amdgpu_device_ip_resume_phase2+0x45/0xb0 [amdgpu]
Jun 19 14:24:39 moc kernel:  amdgpu_device_resume+0xbf/0x380 [amdgpu]
Jun 19 14:24:39 moc kernel:  ? pci_pm_freeze+0xd0/0xd0
Jun 19 14:24:39 moc kernel:  ? pci_pm_freeze+0xd0/0xd0
Jun 19 14:24:39 moc kernel:  dpm_run_callback+0x4d/0x130
Jun 19 14:24:39 moc kernel:  device_resume+0x97/0x190
Jun 19 14:24:39 moc kernel:  async_resume+0x19/0x40
Jun 19 14:24:39 moc kernel:  async_run_entry_fn+0x39/0x160
Jun 19 14:24:39 moc kernel:  process_one_work+0x17b/0x360
Jun 19 14:24:39 moc kernel:  worker_thread+0x2e/0x390
Jun 19 14:24:39 moc kernel:  ? process_one_work+0x360/0x360
Jun 19 14:24:39 moc kernel:  kthread+0x113/0x130
Jun 19 14:24:39 moc kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Jun 19 14:24:39 moc kernel:  ret_from_fork+0x35/0x40
Jun 19 14:24:39 moc kernel: Code: 00 7f ac 48 89 ef e8 dd df a5 ff 48 c7 83 90
aa 00 00 00 00 00 00 89 c5 48 89 df e8 c8 17 00 00 89 e8 5b 5d 41 5c 41 5d 41
5e c3 <0f> 0b e9 48 ff ff ff 0f 0b eb a5 66 90 0f 1f 44 00 00 53 48 89 
Jun 19 14:24:39 moc kernel: ---[ end trace c39336409cdb2ae3 ]---
Jun 19 14:24:39 moc kernel: [drm] UVD and UVD ENC initialized successfully.
Jun 19 14:24:39 moc kernel: ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue
count = 12, Tx Queue count = 12 XDP Queue count = 0


Log after switching to X11
---------------------------
Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147
0x0a304401
Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x08404D46
Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid
0) at page 138431814, read from 'TC5' (0x54433500) (68)
Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 146
0x0000480c
Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x08404D46
Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid
0) at page 138431814, read from 'TC5' (0x54433500) (68)
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147
0x0a304401
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x08404D46
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid
0) at page 138431814, read from 'TC5' (0x54433500) (68)
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147
0x0a304401
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x08404D46
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid
0) at page 138431814, read from 'TC5' (0x54433500) (68)
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 146
0x0000480c
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0E40C60C
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048001
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid
0) at page 239126028, read from 'TC4' (0x54433400) (72)
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 146
0x0000480c
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid
0) at page 0, read from 'TC4' (0x54433400) (72)
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147
0x0a304401
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x08404D46
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid
0) at page 138431814, read from 'TC5' (0x54433500) (68)
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 146
0x0000480c
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid
0) at page 0, read from 'TC4' (0x54433400) (72)
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147
0x0a304401
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x08404D46
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid
0) at page 138431814, read from 'TC5' (0x54433500) (68)
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147
0x0a304401
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x08404D46
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001
Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid
0) at page 138431814, read from 'TC5' (0x54433500) (68)
Jun 19 14:29:24 moc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=384604, last emitted seq=384605
Jun 19 14:29:24 moc kernel: [drm] IP block:gfx_v8_0 is hung!
Jun 19 14:29:24 moc kernel: [drm] GPU recovery disabled.
-- Reboot --

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list