[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Aug 23 19:33:44 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #29 from Andrey Grodzovsky <andrey.grodzovsky at amd.com> ---
(In reply to Jan Jurzitza from comment #28)
> (In reply to Andrey Grodzovsky from comment #25)
> 
> Still same issue happening here on both projects built from git. One issue
> here which doesn't seem completely related:
> Aug 23 20:41:20 archlinux kernel: ------------[ cut here ]------------
> Aug 23 20:41:20 archlinux kernel: CPU update of VM recommended only for
> large BAR system
> Aug 23 20:41:20 archlinux kernel: WARNING: CPU: 5 PID: 1092 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2606 amdgpu_vm_init+0x477/0x490
> [amdgpu]
> Aug 23 20:41:20 archlinux kernel: Modules linked in: bnep nct6775 hwmon_vid
> joydev btusb btrtl btbcm btintel bluetooth snd_usb_audio snd_usbmidi_lib
> snd_rawmidi input_leds snd_seq_device ecdh_generic mousedev nls_iso8859_1
> nls_cp437 vfat fat btrfs zstd_compress libcrc32c zstd_decompress xxhash xor
> arc4 amdkfd amd_iommu_v2 amdgpu iwlmvm mac80211 edac_mce_amd led_class
> kvm_amd iwlwifi snd_hda_codec_realtek chash gpu_sched kvm snd_hda_codec_hdmi
> snd_hda_codec_generic ttm snd_hda_intel drm_kms_helper irqbypass
> snd_hda_codec cfg80211 morus1280_avx2 drm morus1280_sse2 morus1280_glue
> morus640_sse2 morus640_glue snd_hda_core aegis256_aesni aegis128l_aesni
> aegis128_aesni igb snd_hwdep crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel snd_pcm pcbc snd_timer agpgart evdev ccp sp5100_tco
> aesni_intel snd syscopyarea i2c_algo_bit sysfillrect
> Aug 23 20:41:20 archlinux kernel:  aes_x86_64 wmi_bmof mac_hid crypto_simd
> sysimgblt raid6_pq cryptd glue_helper fb_sys_fops soundcore k10temp
> i2c_piix4 dca rfkill rng_core wmi button acpi_cpufreq sch_fq_codel
> vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) sg crypto_user
> ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto sr_mod
> cdrom sd_mod uas usb_storage hid_uclogic hid_generic usbhid hid ahci libahci
> xhci_pci libata crc32c_intel xhci_hcd usbcore scsi_mod usb_common
> Aug 23 20:41:20 archlinux kernel: CPU: 5 PID: 1092 Comm: Xorg.wrap Tainted:
> G           O      4.18.0-rc1-5024f8dfe478 #1
> Aug 23 20:41:20 archlinux kernel: Hardware name: To Be Filled By O.E.M. To
> Be Filled By O.E.M./X370 Gaming-ITX/ac, BIOS P3.40 11/07/2017
> Aug 23 20:41:20 archlinux kernel: RIP: 0010:amdgpu_vm_init+0x477/0x490
> [amdgpu]
> Aug 23 20:41:20 archlinux kernel: Code: b8 08 d8 ff ff e8 79 89 7c e8 e9 ee
> fe ff ff 41 89 ef e9 e6 fe ff ff 48 c7 c7 08 65 f0 c0 c6 05 41 af 2b 00 01
> e8 a3 8f 37 e8 <0f> 0b 0f b6 8b 60 01 00 00 e9 b4 fc ff ff e8 26 8d 37 e8 66
> 0f 1f 
> Aug 23 20:41:20 archlinux kernel: RSP: 0018:ffffacc2c8df7b60 EFLAGS: 00010286
> Aug 23 20:41:20 archlinux kernel: RAX: 0000000000000000 RBX:
> ffff9b10f7bf9000 RCX: 0000000000000006
> Aug 23 20:41:20 archlinux kernel: RDX: 0000000000000007 RSI:
> 0000000000000002 RDI: ffff9b10fe7564d0
> Aug 23 20:41:20 archlinux kernel: RBP: ffff9b10f5640000 R08:
> 0000001856da5330 R09: 0000000000000036
> Aug 23 20:41:20 archlinux kernel: R10: 0000000000000424 R11:
> 000000000006ad48 R12: ffff9b10f7bf90b8
> Aug 23 20:41:20 archlinux kernel: R13: 000000000000000a R14:
> 0000000000000000 R15: 0000000000000000
> Aug 23 20:41:20 archlinux kernel: FS:  00007fcf6cc95500(0000)
> GS:ffff9b10fe740000(0000) knlGS:0000000000000000
> Aug 23 20:41:20 archlinux kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Aug 23 20:41:20 archlinux kernel: CR2: 00007fcf6cb1d960 CR3:
> 00000007e1190000 CR4: 00000000003406e0
> Aug 23 20:41:20 archlinux kernel: Call Trace:
> Aug 23 20:41:20 archlinux kernel:  ? ida_simple_get+0x91/0xf0
> Aug 23 20:41:20 archlinux kernel:  amdgpu_driver_open_kms+0x83/0x1d0 [amdgpu]
> Aug 23 20:41:20 archlinux kernel:  drm_open+0x20b/0x440 [drm]
> Aug 23 20:41:20 archlinux kernel:  drm_stub_open+0xaf/0xf0 [drm]
> Aug 23 20:41:20 archlinux kernel:  chrdev_open+0xa3/0x1b0
> Aug 23 20:41:20 archlinux kernel:  ? cdev_put.part.3+0x20/0x20
> Aug 23 20:41:20 archlinux kernel:  do_dentry_open+0x1ab/0x2d0
> Aug 23 20:41:20 archlinux kernel:  path_openat+0x31b/0x1440
> Aug 23 20:41:20 archlinux kernel:  ? alloc_set_pte+0x1fd/0x4e0
> Aug 23 20:41:20 archlinux kernel:  do_filp_open+0x93/0x100
> Aug 23 20:41:20 archlinux kernel:  ? __check_object_size+0x9c/0x171
> Aug 23 20:41:20 archlinux kernel:  do_sys_open+0x186/0x210
> Aug 23 20:41:20 archlinux kernel:  do_syscall_64+0x4e/0x100
> Aug 23 20:41:20 archlinux kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Aug 23 20:41:20 archlinux kernel: RIP: 0033:0x7fcf6cbbc452
> Aug 23 20:41:20 archlinux kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4c
> 48 8d 05 f5 70 0d 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c
> ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33
> 0c 25 
> Aug 23 20:41:20 archlinux kernel: RSP: 002b:00007ffe9a15b0a0 EFLAGS:
> 00000246 ORIG_RAX: 0000000000000101
> Aug 23 20:41:20 archlinux kernel: RAX: ffffffffffffffda RBX:
> 0000000000000000 RCX: 00007fcf6cbbc452
> Aug 23 20:41:20 archlinux kernel: RDX: 0000000000000002 RSI:
> 00007ffe9a15b180 RDI: 00000000ffffff9c
> Aug 23 20:41:20 archlinux kernel: RBP: 00007ffe9a15b130 R08:
> 0000000000000000 R09: 0000000000000000
> Aug 23 20:41:20 archlinux kernel: R10: 0000000000000000 R11:
> 0000000000000246 R12: 00007ffe9a15b180
> Aug 23 20:41:20 archlinux kernel: R13: 0000000000000000 R14:
> 0000000000000000 R15: 0000000000000000
> Aug 23 20:41:20 archlinux kernel: ---[ end trace eb5bc55fd8b7f883 ]---
> 
> 

This is just a warning meaning you use CPU to update GPU page tables, any
reason why ? try passing kernel  
 amdgpu.vm_update_mode=0 instead.

> and then the issue OP posted too:
> 
> 
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: GPU fault detected:
> 147 0x00a60401 for process payday2_release pid 6643 thread amdgpu_cs:0 pid
> 6644
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x06ABF814
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x2B004001
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: VM fault (0x01, vmid
> 5, pasid 32776) at page 111933460, write from 'TC1' (0x54433100) (4)
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: GPU fault detected:
> 147 0x00a60401 for process payday2_release pid 6643 thread amdgpu_cs:0 pid
> 6644
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x06ABF814
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x2B004001
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: VM fault (0x01, vmid
> 5, pasid 32776) at page 111933460, write from 'TC1' (0x54433100) (4)
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: GPU fault detected:
> 147 0x00a60401 for process payday2_release pid 6643 thread amdgpu_cs:0 pid
> 6644
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x06ABF814
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x23004001
> Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: VM fault (0x01, vmid
> 1, pasid 32776) at page 111933460, write from 'TC1' (0x54433100) (4)
> Aug 23 19:42:06 archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> ring gfx timeout, signaled seq=519868, emitted seq=519871
> Aug 23 19:42:06 archlinux kernel: [drm] GPU recovery disabled.
> 
> 
> Happens on pretty much any application using Vulkan after some time or Core
> OpenGL applications too. Doesn't happen on normal desktop usage with Chrome.

So is it only Vulkan specific ?
> 
> Happens on 4.18.3 and these traces are from 4.18.0-rc1-5024f8dfe478
> X370 chipset (like OP)
> RX 480 (same as OP)
> Ryzen 7 1700x
> Mesa 18.1.6
> xorg 1.20.1
> i3wm

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180823/73180b4c/attachment.html>


More information about the dri-devel mailing list