oops in xe_bo_evict on 6.12.8
Emil J Tywoniak
emil at tywoniak.eu
Mon Jan 13 16:47:33 UTC 2025
Hi Rodrigo,
just a while ago I reported it here: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4055
You're right, I attached the wrong trace. The gitlab issue has the correct one. Feel free to suggest remote+branch(+commit) that I should try, but since I haven't been able to reproduce this oops, switching kernels won't yield any information immediately.
Cheers
Emil
[January 13, 2025 at 5:41 PM, "Rodrigo Vivi" <rodrigo.vivi at intel.com> wrote:
>
> On Fri, Jan 10, 2025 at 05:19:31PM +0000, Emil J Tywoniak wrote:
>
> >
> > What's up gamers,
> >
> >
> >
> > hope this is the right place to report this oops which possibly is due to amdgpu interaction. The community guidelines link for this list (https://01.org/linuxgraphics/community) doesn't work. Feel free to redirect me if not, even to /dev/null. The Video(DRI - Intel) section on kernel bugzilla doesn't seem to get much life.
> >
>
> Hi Emil,
>
> Thanks for your interest and report on Xe bugs.
>
> Please follow this link instead: https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html
>
> The entire dmesg would be needed to help us to understand what's going on here. From the pasted
>
> portion below it doesn't even look like xe is in the picture.
>
> But only more details could help to determine what's going on. Also, keep in mind that Xe,
>
> specially for BMG, was in very active development and a lot might have changed since 6.12.
>
> So, it would be great if you could run some experiments with newer kernel as well.
>
> Thanks,
>
> Rodrigo.
>
> >
> > I see there have been recent changes to things around bo eviction on xe and today I caught the following oops when spawning a second VS Code window in sway with the New Window command (Ctrl+Shift+N). VS Code was not running on XWayland. So far I haven't been able to reproduce this. I have amdgpu loaded as a fall back for my ryzen 7900X builtin graphics since I installed the funny GPU (Intel Arc B580 / BMG G21). I'm on Mesa 24.3.3.
> >
> >
> >
> > ------------[ cut here ]------------
> >
> > workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu]
> >
> > WARNING: CPU: 5 PID: 29199 at kernel/workqueue.c:3704 check_flush_dependency+0x10f/0x130
> >
> > Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq cmac algif_hash algif_skcipher af_alg nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype overlay af_packet bnep btusb btrtl btintel btbcm btmtk bluetooth mousedev cdc_acm joydev nls_iso8859_1 nls_cp437 vfat fat mei_gsc_proxy mei_gsc mei_me mei xt_conntrack ip6t_rpfilter mt7921e ipt_rpfilter mt7921_common mt792x_lib snd_hda_codec_hdmi mt76_connac_lib edac_mce_amd edac_core mt76 snd_hda_intel amd_atl intel_rapl_msr snd_intel_dspcfg xt_pkttype intel_rapl_common snd_intel_sdw_acpi crct10dif_pclmul xt_LOG mac80211 snd_usb_audio uvcvideo nf_log_syslog snd_usbmidi_lib crc32_pclmul snd_hda_codec xt_tcpudp polyval_clmulni videobuf2_vmalloc xe snd_ump polyval_generic uvc snd_hda_core ghash_clmulni_intel cfg80211 nft_compat snd_rawmidi sha512_ssse3 videobuf2_memops spd5118 sha256_ssse3 snd_seq_device videobuf2_v4l2 snd_hwdep r8169 sha1_ssse3 battery sp5100_tco videobuf2_common aesni_intel snd_pcm watchdog realtek gf128mul
> >
> > crypto_simd mdio_devres videodev snd_timer cryptd libphy rfkill snd i2c_piix4 drm_gpuvm wmi_bmof rapl libarc4 led_class mc nf_tables i2c_smbus k10temp soundcore sch_fq_codel tpm_crb rtc_cmos evdev mac_hid tpm_tis gpio_amdpt tiny_power_button tpm_tis_core gpio_generic button uinput hid_xpadneo(O) ff_memless atkbd libps2 serio vivaldi_fmap loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter veth tun tap macvlan bridge stp llc kvm_amd ccp kvm fuse efi_pstore configfs nfnetlink efivarfs tpm libaescfb ecdh_generic ecc rng_core dmi_sysfs ip_tables x_tables autofs4 ext4 crc32c_generic mbcache jbd2 hid_generic usbhid hid ahci libahci xhci_pci libata nvme xhci_hcd scsi_mod nvme_core crc32c_intel scsi_common nvme_auth dm_mod dax amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
> >
> > CPU: 5 UID: 0 PID: 29199 Comm: kworker/u96:0 Tainted: G W O 6.12.8 #1-NixOS
> >
> > Tainted: [W]=WARN, [O]=OOT_MODULE
> >
> > Hardware name: Micro-Star International Co., Ltd. MS-7D75/MAG B650 TOMAHAWK WIFI (MS-7D75), BIOS 1.60 05/30/2023
> >
> > Workqueue: sdma0 drm_sched_run_job_work [gpu_sched]
> >
> > RIP: 0010:check_flush_dependency+0x10f/0x130
> >
> > Code: c0 f3 01 01 90 49 8b 45 18 48 8d b2 c0 00 00 00 48 8d 8b c0 00 00 00 49 89 e8 48 c7 c7 a0 c7 df b4 48 89 c2 e8 82 7e fd ff 90 <0f> 0b 90 90 e9 0a ff ff ff 80 3d 99 c0 f3 01 00 75 8f e9 42 ff ff
> >
> > RSP: 0018:ffff95dd9ef97c60 EFLAGS: 00010046
> >
> > RAX: 0000000000000000 RBX: ffff9265c01b8e00 RCX: 0000000000000000
> >
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> >
> > RBP: ffffffffc0438c00 R08: 0000000000000000 R09: 0000000000000000
> >
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff92681a13b200
> >
> > R13: ffff9265c94338c0 R14: 0000000000000001 R15: ffff9265c01bce00
> >
> > FS: 0000000000000000(0000) GS:ffff926cb7e80000(0000) knlGS:0000000000000000
> >
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >
> > CR2: 000000000050d6d0 CR3: 00000002a38d6000 CR4: 0000000000f50ef0
> >
> > PKRU: 55555554
> >
> > Call Trace:
> >
> > <TASK>
> >
> > ? check_flush_dependency+0x10f/0x130
> >
> > ? __warn.cold+0x93/0xf6
> >
> > ? check_flush_dependency+0x10f/0x130
> >
> > ? report_bug+0x10d/0x150
> >
> > ? srso_alias_return_thunk+0x5/0xfbef5
> >
> > ? handle_bug+0x61/0xb0
> >
> > ? exc_invalid_op+0x17/0x80
> >
> > ? asm_exc_invalid_op+0x1a/0x20
> >
> > ? __pfx_amdgpu_device_delay_enable_gfx_off+0x10/0x10 [amdgpu]
> >
> > ? check_flush_dependency+0x10f/0x130
> >
> > __flush_work+0x10c/0x320
> >
> > cancel_delayed_work_sync+0x62/0x80
> >
> > amdgpu_gfx_off_ctrl+0xb7/0x150 [amdgpu]
> >
> > amdgpu_ring_alloc+0x40/0x70 [amdgpu]
> >
> > amdgpu_ib_schedule+0xf0/0x750 [amdgpu]
> >
> > amdgpu_job_run+0x8e/0x200 [amdgpu]
> >
> > drm_sched_run_job_work+0x283/0x420 [gpu_sched]
> >
> > process_one_work+0x18a/0x350
> >
> > worker_thread+0x235/0x370
> >
> > ? __pfx_worker_thread+0x10/0x10
> >
> > ? __pfx_worker_thread+0x10/0x10
> >
> > kthread+0xcd/0x100
> >
> > ? __pfx_kthread+0x10/0x10
> >
> > ret_from_fork+0x31/0x50
> >
> > ? __pfx_kthread+0x10/0x10
> >
> > ret_from_fork_asm+0x1a/0x30
> >
> > </TASK>
> >
> > ---[ end trace 0000000000000000 ]---
> >
> >
> >
> > I hope this tells you something. I'm willing to switch to some cutting edge kernel commit and report back if I get an oops again, so feel free which remote and commit I should go get, or any other troubleshooting steps I could follow.
> >
> >
> >
> > Thanks for all your hard work,
> >
> >
> >
> > Emil J. Tywoniak (widlarizer)
> >
>
More information about the Intel-xe
mailing list