oops in xe_bo_evict on 6.12.8

Saarinen, Jani jani.saarinen at intel.com
Mon Jan 13 07:51:43 UTC 2025


Hi. 
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Emil J
> Tywoniak
> Sent: Monday, 13 January 2025 1.53
> To: intel-xe at lists.freedesktop.org
> Subject: oops in xe_bo_evict on 6.12.8
> 
> What's up gamers,
> 
> NOTE: this is my second attempt trying to send this email to this mailing list.
> Previously it never showed up on the archive. I have now subscribed and am
> giving it a second attempt.
> 
> I hope this is the right place to report this oops which possibly is due to
> amdgpu interaction. The community guidelines link for this list
> (https://01.org/linuxgraphics/community) doesn't work. Feel free to redirect
> me if not, even to /dev/null. The Video(DRI - Intel) section on kernel bugzilla
> doesn't seem to get much life.

Could you report issue to xe with instructions https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html

> I see there have been recent changes to things around bo eviction on xe and
> today I caught the following oops when spawning a second VS Code window
> in sway with the New Window command (Ctrl+Shift+N). VS Code was not
> running on XWayland. So far I haven't been able to reproduce this. I have
> amdgpu loaded as a fall back for my ryzen 7900X builtin graphics since I
> installed the funny GPU (Intel Arc B580 / BMG G21). I'm on Mesa 24.3.3.

Br
Jani
> 
> 
> ------------[ cut here ]------------
> 
> workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work
> [gpu_sched] is flushing !WQ_MEM_RECLAIM
> events:amdgpu_device_delay_enable_gfx_off [amdgpu]
> 
> WARNING: CPU: 5 PID: 29199 at kernel/workqueue.c:3704
> check_flush_dependency+0x10f/0x130
> 
> Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq cmac
> algif_hash algif_skcipher af_alg nft_chain_nat xt_MASQUERADE
> nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype overlay af_packet
> bnep btusb btrtl btintel btbcm btmtk bluetooth mousedev cdc_acm joydev
> nls_iso8859_1 nls_cp437 vfat fat mei_gsc_proxy mei_gsc mei_me mei
> xt_conntrack ip6t_rpfilter mt7921e ipt_rpfilter mt7921_common mt792x_lib
> snd_hda_codec_hdmi mt76_connac_lib edac_mce_amd edac_core mt76
> snd_hda_intel amd_atl intel_rapl_msr snd_intel_dspcfg xt_pkttype
> intel_rapl_common snd_intel_sdw_acpi crct10dif_pclmul xt_LOG mac80211
> snd_usb_audio uvcvideo nf_log_syslog snd_usbmidi_lib crc32_pclmul
> snd_hda_codec xt_tcpudp polyval_clmulni videobuf2_vmalloc xe snd_ump
> polyval_generic uvc snd_hda_core ghash_clmulni_intel cfg80211 nft_compat
> snd_rawmidi sha512_ssse3 videobuf2_memops spd5118 sha256_ssse3
> snd_seq_device videobuf2_v4l2 snd_hwdep r8169 sha1_ssse3 battery
> sp5100_tco videobuf2_common aesni_intel snd_pcm watchdog realtek
> gf128mul
> 
>  crypto_simd mdio_devres videodev snd_timer cryptd libphy rfkill snd
> i2c_piix4 drm_gpuvm wmi_bmof rapl libarc4 led_class mc nf_tables i2c_smbus
> k10temp soundcore sch_fq_codel tpm_crb rtc_cmos evdev mac_hid tpm_tis
> gpio_amdpt tiny_power_button tpm_tis_core gpio_generic button uinput
> hid_xpadneo(O) ff_memless atkbd libps2 serio vivaldi_fmap loop xt_nat
> nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter veth
> tun tap macvlan bridge stp llc kvm_amd ccp kvm fuse efi_pstore configfs
> nfnetlink efivarfs tpm libaescfb ecdh_generic ecc rng_core dmi_sysfs ip_tables
> x_tables autofs4 ext4 crc32c_generic mbcache jbd2 hid_generic usbhid hid
> ahci libahci xhci_pci libata nvme xhci_hcd scsi_mod nvme_core crc32c_intel
> scsi_common nvme_auth dm_mod dax amdgpu video wmi amdxcp
> i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper
> drm_buddy drm_display_helper cec crc16
> 
> CPU: 5 UID: 0 PID: 29199 Comm: kworker/u96:0 Tainted: G W O 6.12.8 #1-
> NixOS
> 
> Tainted: [W]=WARN, [O]=OOT_MODULE
> 
> Hardware name: Micro-Star International Co., Ltd. MS-7D75/MAG B650
> TOMAHAWK WIFI (MS-7D75), BIOS 1.60 05/30/2023
> 
> Workqueue: sdma0 drm_sched_run_job_work [gpu_sched]
> 
> RIP: 0010:check_flush_dependency+0x10f/0x130
> 
> Code: c0 f3 01 01 90 49 8b 45 18 48 8d b2 c0 00 00 00 48 8d 8b c0 00 00 00
> 49 89 e8 48 c7 c7 a0 c7 df b4 48 89 c2 e8 82 7e fd ff 90 <0f> 0b 90 90 e9 0a ff
> ff ff 80 3d 99 c0 f3 01 00 75 8f e9 42 ff ff
> 
> RSP: 0018:ffff95dd9ef97c60 EFLAGS: 00010046
> 
> RAX: 0000000000000000 RBX: ffff9265c01b8e00 RCX: 0000000000000000
> 
> RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000
> 
> RBP: ffffffffc0438c00 R08: 0000000000000000 R09: 0000000000000000
> 
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff92681a13b200
> 
> R13: ffff9265c94338c0 R14: 0000000000000001 R15: ffff9265c01bce00
> 
> FS: 0000000000000000(0000) GS:ffff926cb7e80000(0000)
> knlGS:0000000000000000
> 
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 
> CR2: 000000000050d6d0 CR3: 00000002a38d6000 CR4:
> 0000000000f50ef0
> 
> PKRU: 55555554
> 
> Call Trace:
> 
>  <TASK>
> 
>  ? check_flush_dependency+0x10f/0x130
> 
>  ? __warn.cold+0x93/0xf6
> 
>  ? check_flush_dependency+0x10f/0x130
> 
>  ? report_bug+0x10d/0x150
> 
>  ? srso_alias_return_thunk+0x5/0xfbef5
> 
>  ? handle_bug+0x61/0xb0
> 
>  ? exc_invalid_op+0x17/0x80
> 
>  ? asm_exc_invalid_op+0x1a/0x20
> 
>  ? __pfx_amdgpu_device_delay_enable_gfx_off+0x10/0x10 [amdgpu]
> 
>  ? check_flush_dependency+0x10f/0x130
> 
>  __flush_work+0x10c/0x320
> 
>  cancel_delayed_work_sync+0x62/0x80
> 
>  amdgpu_gfx_off_ctrl+0xb7/0x150 [amdgpu]
> 
>  amdgpu_ring_alloc+0x40/0x70 [amdgpu]
> 
>  amdgpu_ib_schedule+0xf0/0x750 [amdgpu]
> 
>  amdgpu_job_run+0x8e/0x200 [amdgpu]
> 
>  drm_sched_run_job_work+0x283/0x420 [gpu_sched]
> 
>  process_one_work+0x18a/0x350
> 
>  worker_thread+0x235/0x370
> 
>  ? __pfx_worker_thread+0x10/0x10
> 
>  ? __pfx_worker_thread+0x10/0x10
> 
>  kthread+0xcd/0x100
> 
>  ? __pfx_kthread+0x10/0x10
> 
>  ret_from_fork+0x31/0x50
> 
>  ? __pfx_kthread+0x10/0x10
> 
>  ret_from_fork_asm+0x1a/0x30
> 
>  </TASK>
> 
> ---[ end trace 0000000000000000 ]---
> 
> 
> I hope this tells you something. I'm willing to switch to some cutting edge
> kernel commit and report back if I get an oops again, so feel free which remote
> and commit I should go get, or any other troubleshooting steps I could follow.
> 
> 
> Thanks for all your hard work,
> 
> 
> Emil J. Tywoniak (widlarizer)


More information about the Intel-xe mailing list