[Bug 111021] [amdgpu][CIK] cp queue preemption time out, BUG: kernel NULL pointer dereference, address: 0000000000000038

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Jun 28 22:52:28 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=111021

            Bug ID: 111021
           Summary: [amdgpu][CIK] cp queue preemption time out, BUG:
                    kernel NULL pointer dereference, address:
                    0000000000000038
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: erhard_f at mailbox.org

Created attachment 144678
  --> https://bugs.freedesktop.org/attachment.cgi?id=144678&action=edit
kernel .dmesg (5.2-rc6)

[...]
[  440.685185] cp queue preemption time out
[  440.685338] Resetting wave fronts (nocpsch) on dev 00000000feee3825
[  440.685426] BUG: kernel NULL pointer dereference, address: 0000000000000038
[  440.685432] #PF: supervisor read access in kernel mode
[  440.685436] #PF: error_code(0x0000) - not-present page
[  440.685440] PGD 0 P4D 0 
[  440.685448] Oops: 0000 [#1] SMP NOPTI
[  440.685455] CPU: 3 PID: 1026 Comm: xmr-stak Not tainted 5.2.0-rc6 #1
[  440.685459] Hardware name: System manufacturer System Product Name/M5A78L-M
LX3, BIOS 1401    05/05/2016
[  440.685610] RIP: 0010:amdgpu_ib_schedule+0x4b/0x520 [amdgpu]
[  440.685616] Code: 89 f5 49 89 ff 48 89 54 24 08 0f b6 87 38 04 00 00 48 85
c9 0f 84 5d 03 00 00 48 8b 91 b0 00 00 00 48 89 54 24 10 48 8b 51 10 <48> 8b 52
38 48 89 14 24 84 c0 0f 84 09 e2 17 00 48 83 7c 24 10 00
[  440.685621] RSP: 0018:ffffac368c2a7ad0 EFLAGS: 00010286
[  440.685626] RAX: 0000000000000001 RBX: ffff97d66533dc00 RCX:
ffff97d66533dc00
[  440.685630] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff97d685fe7d48
[  440.685634] RBP: 0000000000000001 R08: ffffac368c2a7b48 R09:
0000000000000001
[  440.685638] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000007
[  440.685642] R13: 0000000000ffd000 R14: ffff97d685fe0000 R15:
ffff97d685fe7d48
[  440.685647] FS:  00007f2115109700(0000) GS:ffff97d6a6ac0000(0000)
knlGS:0000000000000000
[  440.685651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  440.685655] CR2: 0000000000000038 CR3: 00000003e4236000 CR4:
00000000000406e0
[  440.685659] Call Trace:
[  440.685669]  ? rcu_read_lock_sched_held+0x50/0x60
[  440.685807]  amdgpu_amdkfd_submit_ib+0xb6/0x170 [amdgpu]
[  440.685949]  deallocate_vmid.isra.12+0xe4/0xf0 [amdgpu]
[  440.686091]  destroy_queue_nocpsch_locked+0x176/0x190 [amdgpu]
[  440.686233]  process_termination_nocpsch+0x5e/0x130 [amdgpu]
[  440.686373]  kfd_process_dequeue_from_all_devices+0x36/0x50 [amdgpu]
[  440.686512]  kfd_process_notifier_release+0xf4/0x180 [amdgpu]
[  440.686519]  __mmu_notifier_release+0x65/0x110
[  440.686527]  exit_mmap+0x3b/0x170
[  440.686534]  mmput+0x45/0x110
[  440.686539]  do_exit+0x27d/0xb90
[  440.686546]  ? find_held_lock+0x2d/0x90
[  440.686551]  ? get_signal+0xcc/0xaa0
[  440.686556]  do_group_exit+0x42/0xb0
[  440.686561]  get_signal+0x119/0xaa0
[  440.686568]  do_signal+0x3e/0x620
[  440.686574]  ? find_held_lock+0x2d/0x90
[  440.686580]  exit_to_usermode_loop+0x4b/0xa0
[  440.686585]  do_syscall_64+0x149/0x1a0
[  440.686591]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  440.686596] RIP: 0033:0x7f212b976f6c
[  440.686604] Code: Bad RIP value.
[  440.686608] RSP: 002b:00007f2115108d30 EFLAGS: 00000246 ORIG_RAX:
00000000000000ca
[  440.686614] RAX: fffffffffffffe00 RBX: 00007f211d838c48 RCX:
00007f212b976f6c
[  440.686618] RDX: 0000000000000000 RSI: 0000000000000080 RDI:
00007f211d838c70
[  440.686622] RBP: 0000000000000000 R08: 0000000000000000 R09:
00007f2115109700
[  440.686626] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000010
[  440.686630] R13: 00007f211d838c20 R14: 0000000000000000 R15:
00007f211d838c70
[  440.686634] Modules linked in: fuse sha256_ssse3 sha256_generic cfg80211
rfkill dm_crypt nhpoly1305_sse2 nhpoly1305 chacha_x86_64 chacha_generic
adiantum poly1305_generic algif_skcipher af_alg ext4 crc16 mbcache jbd2
input_leds led_class joydev hid_generic usbhid hid crct10dif_pclmul
crc32_generic crc32_pclmul ghash_generic gf128mul gcm xts ctr dm_mod cbc amdgpu
ecb evdev gpu_sched ohci_pci i2c_algo_bit ttm snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi drm_kms_helper ehci_pci ohci_hcd
cfbfillrect syscopyarea snd_hda_intel cfbimgblt k10temp sysfillrect ehci_hcd
aesni_intel sysimgblt fb_sys_fops snd_hda_codec cfbcopyarea fb snd_hwdep
usbcore aes_x86_64 snd_hda_core fam15h_power hwmon i2c_piix4 usb_common font
glue_helper crypto_simd sr_mod snd_pcm cryptd fbdev cdrom button snd_timer drm
acpi_cpufreq snd alx drm_panel_orientation_quirks soundcore processor backlight
mdio lzo nfsd auth_rpcgss lockd grace zstd sunrpc sg zram zsmalloc
[  440.686714] CR2: 0000000000000038
[  440.686720] ---[ end trace 39cfe5e575b273f7 ]---
[  440.686847] RIP: 0010:amdgpu_ib_schedule+0x4b/0x520 [amdgpu]
[  440.686852] Code: 89 f5 49 89 ff 48 89 54 24 08 0f b6 87 38 04 00 00 48 85
c9 0f 84 5d 03 00 00 48 8b 91 b0 00 00 00 48 89 54 24 10 48 8b 51 10 <48> 8b 52
38 48 89 14 24 84 c0 0f 84 09 e2 17 00 48 83 7c 24 10 00
[  440.686857] RSP: 0018:ffffac368c2a7ad0 EFLAGS: 00010286
[  440.686862] RAX: 0000000000000001 RBX: ffff97d66533dc00 RCX:
ffff97d66533dc00
[  440.686866] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff97d685fe7d48
[  440.686869] RBP: 0000000000000001 R08: ffffac368c2a7b48 R09:
0000000000000001
[  440.686873] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000007
[  440.686877] R13: 0000000000ffd000 R14: ffff97d685fe0000 R15:
ffff97d685fe7d48
[  440.686882] FS:  00007f2115109700(0000) GS:ffff97d6a6ac0000(0000)
knlGS:0000000000000000
[  440.686887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  440.686890] CR2: 00007f212b976f42 CR3: 00000003e4236000 CR4:
00000000000406e0
[  440.686894] Fixing recursive fault but reboot is needed!

This happens every time when xmr-stak 2.10.5 (w. ROCm 2.5) tries to compile
shaders for this R9 290X. An ~/.AMD archive is generated but the compilation
process never finishes. When I close the shell with xmr-stak running (CTRL-C
xmr-stack does not work), I get this kernel BUG. I used a 5.2-rc6 debug kernel,
but it happens on 5.1.15 too.

Card is a Sapphire Radeon R9 290X Tri-X OC (11226-18-20G), additional info
about the the system:

Machine:   Type: Desktop Mobo: ASUSTeK model: M5A78L-M LX3 v: Rev X.0x serial:
<root required> BIOS: American Megatrends 
           v: 1401 date: 05/05/2016 
CPU:       6-Core: AMD FX-6300 type: MCP speed: 3817 MHz min/max: 1400/3800 MHz 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Hawaii XT / Grenada XT
[Radeon R9 290X/390X] driver: amdgpu v: kernel 
           Display: x11 server: X.Org 1.20.4 driver: amdgpu,ati unloaded:
modesetting,radeon resolution: 1920x1080~60Hz 
           OpenGL: renderer: AMD Radeon R9 200 Series (HAWAII DRM 3.30.0
5.1.15-gentoo LLVM 8.0.0) v: 4.5 Mesa 19.0.8

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190628/b42d9121/attachment.html>


More information about the dri-devel mailing list