[Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Jul 2 19:48:48 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #12 from dwagner <jb5sgc1n.nya at 20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #10)
> Created attachment 140418 [details] [review]
> drm/amdgpu: Verify root PD is mapped into kernel address space.
> 
> dwagner, please try this patch. Fixes the issue for me and I observed no
> suspend/resume issues.

While I can start X11 with this patch applied to current amd-staging-drm-next,
attempts to resume from S3 fail consistently.

The following related output is emitted right before the suspend:

Jul 02 21:31:32 ryzen kernel: Freezing remaining freezable tasks ... (elapsed
0.000 seconds) done.
Jul 02 21:31:32 ryzen kernel: Suspending console(s) (use no_console_suspend to
debug)
Jul 02 21:31:32 ryzen kernel: sd 9:0:0:0: [sda] Synchronizing SCSI cache
Jul 02 21:31:32 ryzen kernel: [TTM] Buffer eviction failed
Jul 02 21:31:32 ryzen kernel: ACPI: Preparing to enter system sleep state S3
Jul 02 21:31:32 ryzen kernel: PM: Saving platform NVS memory
Jul 02 21:31:32 ryzen kernel: Disabling non-boot CPUs ...

(I wonder if that "[TTM] Buffer eviction failed" is a bad sign - as I have seen
it some other times in conjunction with heavy uses of the amdgpu driver.)


Then, upon resume, the following messages are emitted:

Jul 02 21:31:33 ryzen kernel: ACPI: Low-level resume complete
Jul 02 21:31:33 ryzen kernel: [drm] PCIE GART of 256M enabled (table at
0x000000F400300000).
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 146 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 148 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 145 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 146 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 189 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 306 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 5e ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 18a ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 145 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 146 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 148 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 145 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 146 ret is 0 
Jul 02 21:31:33 ryzen kernel: [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR*
amdgpu: ring 0 test failed (scratch(0xC040)=0xC>
Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]]
*ERROR* resume of IP block <gfx_v8_0> failed -22
Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-22).
Jul 02 21:31:33 ryzen kernel: dpm_run_callback(): pci_pm_resume+0x0/0xa0
returns -22
Jul 02 21:31:33 ryzen kernel: PM: Device 0000:0a:00.0 failed to resume async:
error -22
Jul 02 21:31:33 ryzen kernel: OOM killer enabled.
Jul 02 21:31:33 ryzen kernel: Restarting tasks ... done.
Jul 02 21:31:33 ryzen kernel: PM: suspend exit
Jul 02 21:31:33 ryzen kernel: BUG: unable to handle kernel paging request at
0000000000001000
Jul 02 21:31:33 ryzen kernel: PGD 0 P4D 0 
Jul 02 21:31:33 ryzen kernel: Oops: 0002 [#1] SMP
Jul 02 21:31:33 ryzen kernel: CPU: 14 PID: 791 Comm: amdgpu_cs:0 Tainted: G    
   W  O      4.18.0-rc1-amd+ #45
Jul 02 21:31:33 ryzen kernel: Hardware name: System manufacturer System Product
Name/PRIME X370-PRO, BIOS 4011 04/19/2018
Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30 [amdgpu]
Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1f 44 00
00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0>
Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010202
Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001 RCX:
000000000fe004f1
Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001000 RDI:
ffff8807e2f70000
Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 00000000000004f1 R09:
0000000000001000
Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78000 R12:
0000000000001000
Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73a18 R15:
000000000fe01000
Jul 02 21:31:33 ryzen kernel: FS:  00007f8b57266700(0000)
GS:ffff88081ef80000(0000) knlGS:0000000000000000
Jul 02 21:31:33 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda000 CR4:
00000000003406e0
Jul 02 21:31:33 ryzen kernel: Call Trace:
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_cpu_set_ptes+0x76/0xe0 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_update_ptes+0x1d3/0x2e0 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_frag_ptes+0xae/0x130 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_bo_update_mapping+0xed/0x410 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  ? amdgpu_vm_do_copy_ptes+0xa0/0xa0 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_bo_update+0x310/0x680 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_cs_ioctl+0x1092/0x1a50 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  drm_ioctl_kernel+0xa7/0xf0 [drm]
Jul 02 21:31:33 ryzen kernel:  drm_ioctl+0x2f1/0x3c0 [drm]
Jul 02 21:31:33 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  do_vfs_ioctl+0xa4/0x620
Jul 02 21:31:33 ryzen kernel:  ? __se_sys_futex+0x138/0x180
Jul 02 21:31:33 ryzen kernel:  ksys_ioctl+0x60/0x90
Jul 02 21:31:33 ryzen kernel:  __x64_sys_ioctl+0x16/0x20
Jul 02 21:31:33 ryzen kernel:  do_syscall_64+0x48/0xf0
Jul 02 21:31:33 ryzen kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 02 21:31:33 ryzen kernel: RIP: 0033:0x7f8b66c92667
Jul 02 21:31:33 ryzen kernel: Code: 00 00 90 48 8b 05 e9 67 2c 00 64 c7 00 26
00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 8>
Jul 02 21:31:33 ryzen kernel: RSP: 002b:00007f8b57265a98 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
Jul 02 21:31:33 ryzen kernel: RAX: ffffffffffffffda RBX: 00007f8b57265b88 RCX:
00007f8b66c92667
Jul 02 21:31:33 ryzen kernel: RDX: 00007f8b57265b00 RSI: 00000000c0186444 RDI:
000000000000000b
Jul 02 21:31:33 ryzen kernel: RBP: 00007f8b57265b00 R08: 00007f8b57265bb0 R09:
0000000000000010
Jul 02 21:31:33 ryzen kernel: R10: 00007f8b57265bb0 R11: 0000000000000246 R12:
00000000c0186444
Jul 02 21:31:33 ryzen kernel: R13: 000000000000000b R14: 0000000000000002 R15:
0000000000000000
Jul 02 21:31:33 ryzen kernel: Modules linked in: it87(O) joydev mousedev
hid_generic hidp hid ipt_REJECT nf_reject_ipv4 nf_l>
Jul 02 21:31:33 ryzen kernel:  serio_raw crc32_pclmul atkbd ghash_clmulni_intel
libps2 pcbc ahci libahci xhci_pci libata aes>
Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000
Jul 02 21:31:33 ryzen kernel: ---[ end trace 517a8a72887251f0 ]---
Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30 [amdgpu]
Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1f 44 00
00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0>
Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010202
Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001 RCX:
000000000fe004f1
Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001000 RDI:
ffff8807e2f70000
Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 00000000000004f1 R09:
0000000000001000
Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78000 R12:
0000000000001000
Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73a18 R15:
000000000fe01000
Jul 02 21:31:33 ryzen kernel: FS:  00007f8b57266700(0000)
GS:ffff88081ef80000(0000) knlGS:0000000000000000
Jul 02 21:31:33 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda000 CR4:
00000000003406e0

(At this point, the machine is just dead, and reacts upon nothing.)

So something is still wrong at amdgpu_vm_cpu_set_ptes+0x76

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180702/f92d7faa/attachment-0001.html>


More information about the dri-devel mailing list