[Bug 111881] [kernel 5.4-rc1][amdgpu][CIK]: FW bug: No PASID in KFD interrupt

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Oct 2 11:03:26 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=111881

            Bug ID: 111881
           Summary: [kernel 5.4-rc1][amdgpu][CIK]: FW bug: No PASID in KFD
                    interrupt
           Product: DRI
           Version: XOrg git
          Hardware: x86-64 (AMD64)
                OS: All
            Status: NEW
          Severity: not set
          Priority: not set
         Component: DRM/amdkfd
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: erhard_f at mailbox.org

Created attachment 145612
  --> https://bugs.freedesktop.org/attachment.cgi?id=145612&action=edit
dmesg (kernel 5.4-rc1)

Card is a Sapphire Radeon R9 290 Tri-X running on a Supermicro H8SGL (Opteron
6380) with Gentoo Linux. OpenCL driver is ROCm 2.8.0.

clinfo segfaults, also the kernel gets a hit:

[...]
Okt 02 12:47:51 yea kernel: clinfo[1138]: segfault at 1000 ip 00007f78d4f52971
sp 00007ffd81ab7170 error 6 in libhsa-runtime64.so.1.1.9[7f78d4f34000+c7000]
Okt 02 12:47:51 yea kernel: Code: ff ff ff 48 8b 85 58 ff ff ff 48 8b 80 b8 03
00 00 48 8b 95 78 ff ff ff 48 c1 e2 03 48 01 c2 48 8b 85 68 ff ff ff 48 8b 40
18 <48> 89 02 c6 45 b0 01 bb 00 00 00 00 0f b6 45 b0 83 f0 01 84 c0 74
Okt 02 12:47:59 yea kernel: Evicting PASID 32770 queues
Okt 02 12:47:59 yea kernel: ------------[ cut here ]------------
Okt 02 12:47:59 yea kernel: FW bug: No PASID in KFD interrupt
Okt 02 12:47:59 yea kernel: WARNING: CPU: 5 PID: 0 at
drivers/gpu/drm/amd/amdgpu/../amdkfd/cik_event_interrupt.c:70
cik_event_interrupt_isr+0x223/0x230 [amdgpu]
Okt 02 12:47:59 yea kernel: Modules linked in: fuse dm_crypt nhpoly1305_sse2
nhpoly1305 chacha_x86_64 chacha_generic adiantum poly1305_generic
algif_skcipher amd64_edac_mod crct10dif_pclmul crc32_generic crc32_pclmul
dm_mod joydev input_leds ghash_generic gf128mul gcm hid_generic usbhid hid xts
ext4 crc16 mbcache ctr jbd2 ath5k led_class amdgpu cbc mac80211 ath ohci_pci
ecb evdev cfg80211 gpu_sched ehci_pci ohci_hcd snd_oxygen i2c_algo_bit ehci_hcd
fam15h_power snd_oxygen_lib aesni_intel ttm snd_mpu401_uart sr_mod glue_helper
rfkill snd_rawmidi usbcore crypto_simd k10temp libarc4 cdrom cryptd
drm_kms_helper snd_hda_codec_hdmi hwmon snd_seq_device i2c_piix4 usb_common
cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt snd_hda_intel
fb_sys_fops cfbcopyarea snd_intel_nhlt fb snd_hda_codec font snd_hwdep fbdev
snd_hda_core drm e1000e snd_pcm snd_timer snd drm_panel_orientation_quirks
backlight soundcore button acpi_cpufreq processor lzo zstd sg zram zsmalloc
Okt 02 12:47:59 yea kernel: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.4.0-rc1
#1
Okt 02 12:47:59 yea kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5b   
   03/18/2016
Okt 02 12:47:59 yea kernel: RIP: 0010:cik_event_interrupt_isr+0x223/0x230
[amdgpu]
Okt 02 12:47:59 yea kernel: Code: ff 0f b6 05 53 15 49 00 84 c0 74 07 31 c0 e9
b0 fe ff ff 48 c7 c7 c0 b2 88 c1 88 44 24 08 c6 05 36 15 49 00 01 e8 81 0f a5
f8 <0f> 0b 0f b6 44 24 08 e9 8d fe ff ff 90 48 b8 00 00 00 00 00 fc ff
Okt 02 12:47:59 yea kernel: RSP: 0018:ffff8883e7888c08 EFLAGS: 00010086
Okt 02 12:47:59 yea kernel: RAX: 0000000000000000 RBX: ffff8883cc044b48 RCX:
ffffffffba10693f
Okt 02 12:47:59 yea kernel: RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
ffff8883e5704f80
Okt 02 12:47:59 yea kernel: RBP: ffff8883e7888c40 R08: fffffbfff76d3d31 R09:
fffffbfff76d3d31
Okt 02 12:47:59 yea kernel: R10: fffffbfff76d3d30 R11: ffffffffbb69e983 R12:
0000000000000008
Okt 02 12:47:59 yea kernel: R13: 00000000000000b5 R14: 0000000000000023 R15:
0000000000000000
Okt 02 12:47:59 yea kernel: FS:  0000000000000000(0000)
GS:ffff8883e7880000(0000) knlGS:0000000000000000
Okt 02 12:47:59 yea kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Okt 02 12:47:59 yea kernel: CR2: 00007fea9066f000 CR3: 00000007f52c2000 CR4:
00000000000406e0
Okt 02 12:47:59 yea kernel: Call Trace:
Okt 02 12:47:59 yea kernel:  <IRQ>
Okt 02 12:47:59 yea kernel:  kgd2kfd_interrupt+0x151/0x1a0 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? kgd2kfd_resume+0xa0/0xa0 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? check_flags.part.41+0x82/0x210
Okt 02 12:47:59 yea kernel:  ? amdgpu_fence_process+0x95/0x1b0 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? amdgpu_irq_dispatch+0x184/0x390 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? gfx_v7_0_eop_irq+0xba/0x100 [amdgpu]
Okt 02 12:47:59 yea kernel:  amdgpu_irq_dispatch+0x1c6/0x390 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? amdgpu_irq_add_id+0x160/0x160 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? lock_downgrade+0x390/0x390
Okt 02 12:47:59 yea kernel:  amdgpu_ih_process+0xf4/0x1d0 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? amdgpu_irq_disable_all+0x1b0/0x1b0 [amdgpu]
Okt 02 12:47:59 yea kernel:  amdgpu_irq_handler+0x20/0x60 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? amdgpu_irq_disable_all+0x1b0/0x1b0 [amdgpu]
Okt 02 12:47:59 yea kernel:  __handle_irq_event_percpu+0x72/0x390
Okt 02 12:47:59 yea kernel:  handle_irq_event_percpu+0x6a/0xe0
Okt 02 12:47:59 yea kernel:  ? __handle_irq_event_percpu+0x390/0x390
Okt 02 12:47:59 yea kernel:  ? rwlock_bug.part.2+0x50/0x50
Okt 02 12:47:59 yea kernel:  ? do_raw_spin_unlock+0x9d/0x130
Okt 02 12:47:59 yea kernel:  handle_irq_event+0x4f/0x7e
Okt 02 12:47:59 yea kernel:  handle_edge_irq+0x100/0x2d0
Okt 02 12:47:59 yea kernel:  do_IRQ+0x72/0x160
Okt 02 12:47:59 yea kernel:  common_interrupt+0xf/0xf
Okt 02 12:47:59 yea kernel:  </IRQ>
Okt 02 12:47:59 yea kernel: RIP: 0010:cpuidle_enter_state+0xcd/0x640
Okt 02 12:47:59 yea kernel: Code: 00 31 ff e8 a5 86 80 ff 80 7c 24 10 00 74 12
9c 58 f6 c4 02 0f 85 42 05 00 00 31 ff e8 cc 5e 89 ff e8 f7 be 8f ff fb 45 85
e4 <0f> 88 fb 03 00 00 4d 63 ec 4f 8d 74 6d 00 49 c1 e6 05 4a 8d 7c 33
Okt 02 12:47:59 yea kernel: RSP: 0018:ffff8883e571fd98 EFLAGS: 00000202
ORIG_RAX: ffffffffffffffdd
Okt 02 12:47:59 yea kernel: RAX: 0000000000000000 RBX: ffffffffc0316680 RCX:
ffffffffba1067e0
Okt 02 12:47:59 yea kernel: RDX: 0000000000000007 RSI: dffffc0000000000 RDI:
ffff8883e5704fb4
Okt 02 12:47:59 yea kernel: RBP: ffff888812779028 R08: 0000000000000002 R09:
0000000000000000
Okt 02 12:47:59 yea kernel: R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000002
Okt 02 12:47:59 yea kernel: R13: 0000000000000002 R14: ffffffffc0316740 R15:
ffffffffc0316780
Okt 02 12:47:59 yea kernel:  ? lockdep_hardirqs_on+0x190/0x280
Okt 02 12:47:59 yea kernel:  ? cpuidle_enter_state+0xc9/0x640
Okt 02 12:47:59 yea kernel:  cpuidle_enter+0x37/0x60
Okt 02 12:47:59 yea kernel:  do_idle+0x2e7/0x380
Okt 02 12:47:59 yea kernel:  ? arch_cpu_idle_exit+0x40/0x40
Okt 02 12:47:59 yea kernel:  ? schedule_idle+0x41/0x50
Okt 02 12:47:59 yea kernel:  cpu_startup_entry+0x14/0x20
Okt 02 12:47:59 yea kernel:  start_secondary+0x1fd/0x240
Okt 02 12:47:59 yea kernel:  ? set_cpu_sibling_map+0xbc0/0xbc0
Okt 02 12:47:59 yea kernel:  secondary_startup_64+0xa4/0xb0
Okt 02 12:47:59 yea kernel: irq event stamp: 450550
Okt 02 12:47:59 yea kernel: hardirqs last  enabled at (450547):
[<ffffffffba8c30b9>] cpuidle_enter_state+0xc9/0x640
Okt 02 12:47:59 yea kernel: hardirqs last disabled at (450548):
[<ffffffffba00276a>] trace_hardirqs_off_thunk+0x1a/0x20
Okt 02 12:47:59 yea kernel: softirqs last  enabled at (450550):
[<ffffffffba07b210>] irq_enter+0x70/0x80
Okt 02 12:47:59 yea kernel: softirqs last disabled at (450549):
[<ffffffffba07b1f5>] irq_enter+0x55/0x80
Okt 02 12:47:59 yea kernel: ---[ end trace 5951fa91933dcafd ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191002/50cce81c/attachment.html>


More information about the dri-devel mailing list