oops in xe_bo_evict on 6.12.8

Emil J Tywoniak emil at tywoniak.eu
Mon Jan 13 16:47:33 UTC 2025


Hi Rodrigo,

just a while ago I reported it here: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4055
You're right, I attached the wrong trace. The gitlab issue has the correct one. Feel free to suggest remote+branch(+commit) that I should try, but since I haven't been able to reproduce this oops, switching kernels won't yield any information immediately.

Cheers
Emil

[January 13, 2025 at 5:41 PM, "Rodrigo Vivi" <rodrigo.vivi at intel.com> wrote:



> 
> On Fri, Jan 10, 2025 at 05:19:31PM +0000, Emil J Tywoniak wrote:
> 
> > 
> > What's up gamers,
> > 
> >  
> > 
> >  hope this is the right place to report this oops which possibly is due to amdgpu interaction. The community guidelines link for this list (https://01.org/linuxgraphics/community) doesn't work. Feel free to redirect me if not, even to /dev/null. The Video(DRI - Intel) section on kernel bugzilla doesn't seem to get much life.
> > 
> 
> Hi Emil,
> 
> Thanks for your interest and report on Xe bugs.
> 
> Please follow this link instead: https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html
> 
> The entire dmesg would be needed to help us to understand what's going on here. From the pasted
> 
> portion below it doesn't even look like xe is in the picture.
> 
> But only more details could help to determine what's going on. Also, keep in mind that Xe,
> 
> specially for BMG, was in very active development and a lot might have changed since 6.12.
> 
> So, it would be great if you could run some experiments with newer kernel as well.
> 
> Thanks,
> 
> Rodrigo.
> 
> > 
> > I see there have been recent changes to things around bo eviction on xe and today I caught the following oops when spawning a second VS Code window in sway with the New Window command (Ctrl+Shift+N). VS Code was not running on XWayland. So far I haven't been able to reproduce this. I have amdgpu loaded as a fall back for my ryzen 7900X builtin graphics since I installed the funny GPU (Intel Arc B580 / BMG G21). I'm on Mesa 24.3.3.
> > 
> >  
> > 
> >  ------------[ cut here ]------------
> > 
> >  workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu]
> > 
> >  WARNING: CPU: 5 PID: 29199 at kernel/workqueue.c:3704 check_flush_dependency+0x10f/0x130
> > 
> >  Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq cmac algif_hash algif_skcipher af_alg nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype overlay af_packet bnep btusb btrtl btintel btbcm btmtk bluetooth mousedev cdc_acm joydev nls_iso8859_1 nls_cp437 vfat fat mei_gsc_proxy mei_gsc mei_me mei xt_conntrack ip6t_rpfilter mt7921e ipt_rpfilter mt7921_common mt792x_lib snd_hda_codec_hdmi mt76_connac_lib edac_mce_amd edac_core mt76 snd_hda_intel amd_atl intel_rapl_msr snd_intel_dspcfg xt_pkttype intel_rapl_common snd_intel_sdw_acpi crct10dif_pclmul xt_LOG mac80211 snd_usb_audio uvcvideo nf_log_syslog snd_usbmidi_lib crc32_pclmul snd_hda_codec xt_tcpudp polyval_clmulni videobuf2_vmalloc xe snd_ump polyval_generic uvc snd_hda_core ghash_clmulni_intel cfg80211 nft_compat snd_rawmidi sha512_ssse3 videobuf2_memops spd5118 sha256_ssse3 snd_seq_device videobuf2_v4l2 snd_hwdep r8169 sha1_ssse3 battery sp5100_tco videobuf2_common aesni_intel snd_pcm watchdog realtek gf128mul
> > 
> >  crypto_simd mdio_devres videodev snd_timer cryptd libphy rfkill snd i2c_piix4 drm_gpuvm wmi_bmof rapl libarc4 led_class mc nf_tables i2c_smbus k10temp soundcore sch_fq_codel tpm_crb rtc_cmos evdev mac_hid tpm_tis gpio_amdpt tiny_power_button tpm_tis_core gpio_generic button uinput hid_xpadneo(O) ff_memless atkbd libps2 serio vivaldi_fmap loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter veth tun tap macvlan bridge stp llc kvm_amd ccp kvm fuse efi_pstore configfs nfnetlink efivarfs tpm libaescfb ecdh_generic ecc rng_core dmi_sysfs ip_tables x_tables autofs4 ext4 crc32c_generic mbcache jbd2 hid_generic usbhid hid ahci libahci xhci_pci libata nvme xhci_hcd scsi_mod nvme_core crc32c_intel scsi_common nvme_auth dm_mod dax amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
> > 
> >  CPU: 5 UID: 0 PID: 29199 Comm: kworker/u96:0 Tainted: G W O 6.12.8 #1-NixOS
> > 
> >  Tainted: [W]=WARN, [O]=OOT_MODULE
> > 
> >  Hardware name: Micro-Star International Co., Ltd. MS-7D75/MAG B650 TOMAHAWK WIFI (MS-7D75), BIOS 1.60 05/30/2023
> > 
> >  Workqueue: sdma0 drm_sched_run_job_work [gpu_sched]
> > 
> >  RIP: 0010:check_flush_dependency+0x10f/0x130
> > 
> >  Code: c0 f3 01 01 90 49 8b 45 18 48 8d b2 c0 00 00 00 48 8d 8b c0 00 00 00 49 89 e8 48 c7 c7 a0 c7 df b4 48 89 c2 e8 82 7e fd ff 90 <0f> 0b 90 90 e9 0a ff ff ff 80 3d 99 c0 f3 01 00 75 8f e9 42 ff ff
> > 
> >  RSP: 0018:ffff95dd9ef97c60 EFLAGS: 00010046
> > 
> >  RAX: 0000000000000000 RBX: ffff9265c01b8e00 RCX: 0000000000000000
> > 
> >  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > 
> >  RBP: ffffffffc0438c00 R08: 0000000000000000 R09: 0000000000000000
> > 
> >  R10: 0000000000000000 R11: 0000000000000000 R12: ffff92681a13b200
> > 
> >  R13: ffff9265c94338c0 R14: 0000000000000001 R15: ffff9265c01bce00
> > 
> >  FS: 0000000000000000(0000) GS:ffff926cb7e80000(0000) knlGS:0000000000000000
> > 
> >  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > 
> >  CR2: 000000000050d6d0 CR3: 00000002a38d6000 CR4: 0000000000f50ef0
> > 
> >  PKRU: 55555554
> > 
> >  Call Trace:
> > 
> >  <TASK>
> > 
> >  ? check_flush_dependency+0x10f/0x130
> > 
> >  ? __warn.cold+0x93/0xf6
> > 
> >  ? check_flush_dependency+0x10f/0x130
> > 
> >  ? report_bug+0x10d/0x150
> > 
> >  ? srso_alias_return_thunk+0x5/0xfbef5
> > 
> >  ? handle_bug+0x61/0xb0
> > 
> >  ? exc_invalid_op+0x17/0x80
> > 
> >  ? asm_exc_invalid_op+0x1a/0x20
> > 
> >  ? __pfx_amdgpu_device_delay_enable_gfx_off+0x10/0x10 [amdgpu]
> > 
> >  ? check_flush_dependency+0x10f/0x130
> > 
> >  __flush_work+0x10c/0x320
> > 
> >  cancel_delayed_work_sync+0x62/0x80
> > 
> >  amdgpu_gfx_off_ctrl+0xb7/0x150 [amdgpu]
> > 
> >  amdgpu_ring_alloc+0x40/0x70 [amdgpu]
> > 
> >  amdgpu_ib_schedule+0xf0/0x750 [amdgpu]
> > 
> >  amdgpu_job_run+0x8e/0x200 [amdgpu]
> > 
> >  drm_sched_run_job_work+0x283/0x420 [gpu_sched]
> > 
> >  process_one_work+0x18a/0x350
> > 
> >  worker_thread+0x235/0x370
> > 
> >  ? __pfx_worker_thread+0x10/0x10
> > 
> >  ? __pfx_worker_thread+0x10/0x10
> > 
> >  kthread+0xcd/0x100
> > 
> >  ? __pfx_kthread+0x10/0x10
> > 
> >  ret_from_fork+0x31/0x50
> > 
> >  ? __pfx_kthread+0x10/0x10
> > 
> >  ret_from_fork_asm+0x1a/0x30
> > 
> >  </TASK>
> > 
> >  ---[ end trace 0000000000000000 ]---
> > 
> >  
> > 
> >  I hope this tells you something. I'm willing to switch to some cutting edge kernel commit and report back if I get an oops again, so feel free which remote and commit I should go get, or any other troubleshooting steps I could follow.
> > 
> >  
> > 
> >  Thanks for all your hard work,
> > 
> >  
> > 
> >  Emil J. Tywoniak (widlarizer)
> >
>


More information about the Intel-xe mailing list