<div dir="ltr"><div>Thanks Harry and Alex, that was pretty useful. I've attached the two dmesg logs for two separate trials I first manually loaded amdgpu and then after triggering the crash in ue4editor, reboot and tried again with the manually loaded radeon module. The radeon one seems to give more information and the screen flashed a few times before freezing. When running amdgpu the x session just hangs and does nothing. I made sure I was running the latest (git 17-06-05) padoka builds. I blacklisted both modules and rebuilt grub config. </div><div>I was able to ssh into the machine for a while after the x session had completely frozen (including frozen mouse) until it also disconnected after a few minutes. </div><div></div><div> </div><div>This is the part of dmesg running radeon where it appears to go off the rails: radeon 0000:07:00.0: ring 4 stalled for more than 10024msec [ +0.000004] radeon 0000:07:00.0: GPU lockup (current fence id 0x0000000000000fd0 last fence id 0x0000000000000fd2 on ring 4) [ +0.485614] radeon 0000:07:00.0: Saved 724 dwords of commands on ring 0. [ +0.000126] radeon 0000:07:00.0: GPU softreset: 0x0000004D [ +0.000001] radeon 0000:07:00.0: GRBM_STATUS = 0xA0403028 [ +0.000001] radeon 0000:07:00.0: GRBM_STATUS_SE0 = 0x08000006 [ +0.000001] radeon 0000:07:00.0: GRBM_STATUS_SE1 = 0x08000006 [ +0.000001] radeon 0000:07:00.0: SRBM_STATUS = 0x200000C0 [ +0.000118] radeon 0000:07:00.0: SRBM_STATUS2 = 0x00000000 [ +0.000002] radeon 0000:07:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008678_CP_STALLED_STAT2 = 0x00018000 [ +0.000001] radeon 0000:07:00.0: R_00867C_CP_BUSY_STAT = 0x00400006 [ +0.000001] radeon 0000:07:00.0: R_008680_CP_STAT = 0x84038647 [ +0.000001] radeon 0000:07:00.0: R_00D034_DMA_STATUS_REG = 0x44C83106 [ +0.000001] radeon 0000:07:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [ +0.000002] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ +0.000001] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [ +0.465637] radeon 0000:07:00.0: GRBM_SOFT_RESET=0x0000DDFF [ +0.000052] radeon 0000:07:00.0: SRBM_SOFT_RESET=0x00100100 [ +0.001146] radeon 0000:07:00.0: GRBM_STATUS = 0x00003028 [ +0.000002] radeon 0000:07:00.0: GRBM_STATUS_SE0 = 0x00000006 [ +0.000001] radeon 0000:07:00.0: GRBM_STATUS_SE1 = 0x00000006 [ +0.000000] radeon 0000:07:00.0: SRBM_STATUS = 0x200000C0 [ +0.000111] radeon 0000:07:00.0: SRBM_STATUS2 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008680_CP_STAT = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ +0.000001] radeon 0000:07:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [ +0.000247] radeon 0000:07:00.0: GPU reset succeeded, trying to resume [ +0.025448] [drm] probing gen 2 caps for device 8086:151 = 261a103/e [ +0.000003] [drm] PCIE gen 3 link speeds already enabled [ +0.002586] [drm] PCIE GART of 2048M enabled (table at 0x00000000001D6000). [ +0.000120] radeon 0000:07:00.0: WB enabled [ +0.000002] radeon 0000:07:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff94260bdd8c00 [ +0.000001] radeon 0000:07:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff94260bdd8c04 [ +0.000000] radeon 0000:07:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff94260bdd8c08 [ +0.000001] radeon 0000:07:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff94260bdd8c0c [ +0.000001] radeon 0000:07:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff94260bdd8c10 [ +0.000314] radeon 0000:07:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffaa3d89635a18 [ +0.010136] radeon 0000:07:00.0: failed VCE resume (-22). [ +0.159454] [drm] ring test on 0 succeeded in 4 usecs [ +0.000004] [drm] ring test on 1 succeeded in 1 usecs [ +0.000003] [drm] ring test on 2 succeeded in 1 usecs [ +0.000009] [drm] ring test on 3 succeeded in 6 usecs [ +0.000007] [drm] ring test on 4 succeeded in 5 usecs [ +0.175707] [drm] ring test on 5 succeeded in 2 usecs [ +0.000004] [drm] UVD initialized successfully. [ +1.041140] [drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait timed out. [ +0.000018] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-110). [ +0.474934] radeon 0000:07:00.0: GPU softreset: 0x00000048 [ +0.000002] radeon 0000:07:00.0: GRBM_STATUS = 0xA0003028 [ +0.000001] radeon 0000:07:00.0: GRBM_STATUS_SE0 = 0x00000006 [ +0.000001] radeon 0000:07:00.0: GRBM_STATUS_SE1 = 0x00000006 [ +0.000001] radeon 0000:07:00.0: SRBM_STATUS = 0x200000C0 [ +0.000118] radeon 0000:07:00.0: SRBM_STATUS2 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008678_CP_STALLED_STAT2 = 0x00010000 [ +0.000001] radeon 0000:07:00.0: R_00867C_CP_BUSY_STAT = 0x00400002 [ +0.000002] radeon 0000:07:00.0: R_008680_CP_STAT = 0x84010243 [ +0.000001] radeon 0000:07:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ +0.000001] radeon 0000:07:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [ +0.000002] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ +0.000001] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [ +0.465304] radeon 0000:07:00.0: GRBM_SOFT_RESET=0x0000DDFF [ +0.000056] radeon 0000:07:00.0: SRBM_SOFT_RESET=0x00000100 [ +0.001147] radeon 0000:07:00.0: GRBM_STATUS = 0x00003028 [ +0.000001] radeon 0000:07:00.0: GRBM_STATUS_SE0 = 0x00000006 [ +0.000001] radeon 0000:07:00.0: GRBM_STATUS_SE1 = 0x00000006 [ +0.000001] radeon 0000:07:00.0: SRBM_STATUS = 0x200000C0 [ +0.000110] radeon 0000:07:00.0: SRBM_STATUS2 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_008680_CP_STAT = 0x00000000 [ +0.000001] radeon 0000:07:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ +0.000001] radeon 0000:07:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [ +0.000238] radeon 0000:07:00.0: GPU reset succeeded, trying to resume </div><div>Eventually it starts to do this: [ +1.153721] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait timed out. [ +0.000018] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 5 (-110). [ +0.000008] radeon 0000:07:00.0: scheduling IB failed (-12). [ +0.000011] [drm:radeon_vce_get_create_msg [radeon]] *ERROR* radeon: failed to schedule ib (-12). [ +0.000018] [drm:radeon_vce_ib_test [radeon]] *ERROR* radeon: failed to get create msg (-12). [ +0.000010] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 6 (-12). [ +0.000002] radeon 0000:07:00.0: scheduling IB failed (-12). [ +0.000010] [drm:radeon_vce_get_create_msg [radeon]] *ERROR* radeon: failed to schedule ib (-12). [ +0.000009] [drm:radeon_vce_ib_test [radeon]] *ERROR* radeon: failed to get create msg (-12). [ +0.000009] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 7 (-12). [ +0.001058] radeon 0000:07:00.0: GPU fault detected: 147 0x00044802 [ +0.000003] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000FF000 [ +0.000001] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04048002 [ +0.000001] VM fault (0x02, vmid 2) at page 1044480, read from TC (72) [ +10.114896] radeon 0000:07:00.0: ring 0 stalled for more than 10116msec [ +0.000004] radeon 0000:07:00.0: GPU lockup (current fence id 0x0000000000001506 last fence id 0x000000000000151a on ring 0) [ +0.000022] radeon 0000:07:00.0: ring 4 stalled for more than 10112msec [ +0.000003] radeon 0000:07:00.0: GPU lockup (current fence id 0x0000000000000fd3 last fence id 0x0000000000000fd7 on ring 4) [ +0.000029] radeon 0000:07:00.0: ring 3 stalled for more than 10116msec [ +0.000002] radeon 0000:07:00.0: GPU lockup (current fence id 0x00000000000018b7 last fence id 0x000000000000190c on ring 3) [ +0.507937] radeon 0000:07:00.0: ring 3 stalled for more than 10624msec [ +0.000001] radeon 0000:07:00.0: ring 0 stalled for more than 10624msec Then the log repeats a few hundred times the non-utf-8 character "\00" before it cuts off. </div><div> </div><div> This is the part of dmesg running amdgpu where it appears to go off the rails: [Jun 6 11:16] INFO: task RenderThread 3:6190 blocked for more than 120 seconds. [ +0.000006] Tainted: G OE 4.11.0-mytest #2 [ +0.000002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.000002] RenderThread 3 D 0 6190 1612 0x00000000 [ +0.000003] Call Trace: [ +0.000007] __schedule+0x3c6/0x8c0 [ +0.000004] schedule+0x36/0x80 [ +0.000043] amd_sched_entity_push_job+0xc4/0x110 [amdgpu] [ +0.000003] ? wake_atomic_t_function+0x60/0x60 [ +0.000031] amdgpu_job_submit+0x72/0x90 [amdgpu] [ +0.000027] amdgpu_vm_bo_split_mapping+0x51f/0x7c0 [amdgpu] [ +0.000024] ? amdgpu_vm_do_copy_ptes+0x90/0x90 [amdgpu] [ +0.000024] amdgpu_vm_clear_freed+0x70/0xb0 [amdgpu] [ +0.000024] amdgpu_gem_va_ioctl+0x39a/0x3f0 [amdgpu] [ +0.000014] drm_ioctl+0x218/0x4b0 [drm] [ +0.000010] ? drm_ioctl+0x218/0x4b0 [drm] [ +0.000023] ? amdgpu_gem_metadata_ioctl+0x1d0/0x1d0 [amdgpu] [ +0.000003] ? kmem_cache_free+0x1b6/0x1e0 [ +0.000020] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] [ +0.000003] do_vfs_ioctl+0xa3/0x600 [ +0.000002] ? ____fput+0xe/0x10 [ +0.000003] ? task_work_run+0x85/0xa0 [ +0.000002] SyS_ioctl+0x79/0x90 [ +0.000002] entry_SYSCALL_64_fastpath+0x1e/0xad [ +0.000002] RIP: 0033:0x7fb15e3cb987 [ +0.000001] RSP: 002b:00007faefab6a778 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000002] RAX: ffffffffffffffda RBX: 00007faf340060e0 RCX: 00007fb15e3cb987 [ +0.000001] RDX: 00007faefab6a7c0 RSI: 00000000c0286448 RDI: 0000000000000009 [ +0.000001] RBP: 00007faefab6a810 R08: 000000011c430000 R09: 000000000000000e [ +0.000001] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000040086409 [ +0.000001] R13: 0000000000000009 R14: 00007faf34ef5700 R15: 00007faf34d61780 [Jun 6 11:17] wlp4s0: disconnect from AP 80:2a:a8:11:23:5e for new auth to 80:2a:a8:11:24:29 [ +0.011326] wlp4s0: authenticate with 80:2a:a8:11:24:29 [ +0.020720] wlp4s0: send auth to 80:2a:a8:11:24:29 (try 1/3) [ +0.002594] wlp4s0: authenticated [ +0.002803] wlp4s0: associate with 80:2a:a8:11:24:29 (try 1/3) [ +0.005088] wlp4s0: RX AssocResp from 80:2a:a8:11:24:29 (capab=0x431 status=0 aid=2) [ +0.000223] wlp4s0: associated </div><div> </div><div>End of xorg.log.0 (for an earlier run) [ 12.056] (II) systemd-logind: got pause for 13:79 [ 32.415] (II) config/udev: removing GPU device /sys/devices/pci0000:00/0000:00:01.0/0000:07:00.0/drm/card0 /dev/dri/card0 [ 32.415] (II) config/udev: Adding drm device (/dev/dri/card0) [ 32.415] (II) xfree86: Adding drm device (/dev/dri/card0) [ 35.984] (II) systemd-logind: got resume for 13:81 [ 35.984] (EE) FBDEV(0): FBIOPUT_VSCREENINFO: No such device [ 35.984] (EE) Fatal server error: [ 35.984] (EE) EnterVT failed for screen 0 [ 35.984] (EE) [ 35.984] (EE) Please consult the The X.Org Foundation support at <a href="http://wiki.x.org">http://wiki.x.org</a> for help. [ 35.984] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 35.984] (EE) [ 35.984] (EE) FBDEV(0): FBIOPUT_VSCREENINFO: No such device [ 36.074] (EE) Server terminated with error (1). Closing log file. </div> </div><div class="gmail_extra"> <div class="gmail_quote">On 6 June 2017 at 05:45, Alex Deucher <<a href="mailto:alexdeucher@gmail.com" target="_blank">alexdeucher@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon, Jun 5, 2017 at 2:36 PM, Harry Wentland <<a href="mailto:harry.wentland@amd.com">harry.wentland@amd.com</a>> wrote: > Hi Luke, > > the first things to check would be the saved kern.log and Xorg.0.log from > before the crash occured. Both should be in /var/log. These logs will keep a > long record but you should be able to find the bad run as kern.log is > timestamped and with Xorg.0.log you should be able to scroll back to the > previous X run. > > That might give some indication, either by logging an error or even showing > a stack trace, but sometimes they don't flush out completely when the system > crashes. > > Some of the other experts on here (Alex, Michel, Christian, etc) might give > you more ideas should these logs not show anything. Basically include your xorg log and dmesg output and a description of the problem you are seeing. If possible, can you get remote access to the system (e.g., ssh)? If you lose your console, you can still get the logs remotely. Another option would be to blacklist the driver (append modprobe.blacklist=radeon,amdgpu to the kernel command line in grub) and then boot to a non-X runlevel. Then manually load the driver (modprobe radeon or modprobe amdgpu) and see if you can get better logs. Alex <div class="HOEnZb"><div class="h5"> > > Harry > > > On 2017-06-02 06:46 PM, Luke Miller wrote: >> >> Greetings, thanks for the awesome drivers. >> >> I have a recurring problem with one 3D program (UE4editor) crashing my >> computer during a particular operation. >> >> I believe the problem is at the DRM layer. >> >> Brief details: >> Radeon 7850HD >> It crashes using radeon and amdgpu (on ubuntu 17.04, linux 4.12, padoka >> ppa). >> It does not crash using the fglrx driver (on ubuntu 14.04). >> The entire machine hangs and requires a reset. >> >> My question: >> How can I go about collecting more information to fix/report the problem? >> Is there a tutorial on which logs to look at? I may even be able to fix it >> myself if I could get a stack trace. >> >> Part of the fun is that it is an older card. >> >> >> >> >> >> >> _______________________________________________ >> amd-gfx mailing list >> <a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a> >> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a> >> > _______________________________________________ > amd-gfx mailing list > <a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a> > <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a> </div></div></blockquote></div> </div>