[bug report] drm/amdgpu: amdgpu crash on playing videos, linux 6.10-rc

Wang Yunchen mac-wang at sjtu.edu.cn
Mon Jun 17 09:40:20 UTC 2024


On Mon, 2024-06-17 at 06:55 +0800, Winston Ma wrote:
> I only build the kernel once. I could try but I think you couldn't
> expect much from my side.
> 
> BTW I installed 6.10-rc4 this morning from Ubuntu mainline
> (https://kernel.ubuntu.com/mainline/v6.10-rc4/amd64/) and I couldn't
> replicate the video crash problem. Yunchen could you try 6.10-rc4 and
> see if you still have the video crash problem?
> 
> But I still get the green blocky object when I keep toggling full
> screen during youtube watch (Screenshot: https://ibb.co/8Dpdxc3). I
> didn't see the green block in 6.9 so it could be another issue.
> 
> Thanks and Regards,
> Winston
> 
> 
> On Sun, Jun 16, 2024 at 12:10 AM Wang Yunchen <mac-wang at sjtu.edu.cn> wrote:
> > 
> > On Sat, 2024-06-15 at 17:50 +0200, Thorsten Leemhuis wrote:
> > > [reply made easier by moving something in the quote]
> > > 
> > > On 12.06.24 18:55, Wang Yunchen wrote:
> > > > On Wed, 2024-06-12 at 15:14 +0200, Linux regression tracking (Thorsten
> > > > Leemhuis) wrote:
> > > > > On 06.06.24 05:06, Winston Ma wrote:
> > > > > > Hi I got the same problem on Linux Kernel 6.10-rc2. I got the
> > > > > > problem
> > > > > > by
> > > > > > following the procedure below:
> > > > > > 
> > > > > >  1. Boot Linux Kernel 6.10-rc2
> > > > > >  2. Open Firefox (Any browser should work)
> > > > > >  3. Open a Youtube Video
> > > > > >  4. On the playing video, toggle fullscreen quickly Then after 10-
> > > > > > 20
> > > > > >     times of fullscreen toggling, the screen would enter freeze
> > > > > > mode.
> > > > > >     This is the log that I captured using the above method.
> > > > > 
> > > > > Hmm, seems nothing happened here for a while. Could you maybe try to
> > > > > bisect this
> > > > > (
> > > > > https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.ht
> > > > > ml
> > > > > )?
> > > > 
> > > > It seems that the issue persists on 6.10 rc3.
> > > 
> > > That's good to know, but...
> > > 
> > > > > @amd-gfx devs: Or is this unneeded, as the cause found or maybe even
> > > > > fixed meanwhile?
> > > 
> > > ...as there was no reply to that inquiry it seems we really need either
> > > you or Winston Ma (or somebody else that is affected we don't yet know
> > > about) to perform a git bisection (see the link quoted above) to find
> > > the exact change that broke things. Without this it might not be getting
> > > fixed.
> > > 
> > > Ciao, Thorsten
> > > 
> > > > > > This is the kernel log
> > > > > > 
> > > > > > 2024-06-06T10:26:40.747576+08:00 kernel:
> > > > > > gmc_v10_0_process_interrupt:
> > > > > > 6 callbacks suppressed
> > > > > > 2024-06-06T10:26:40.747618+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > [mmhub] page fault (src_id:0 ring:8 vmid:2
> > > > > > pasid:32789)
> > > > > > 2024-06-06T10:26:40.747623+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > in process RDD Process pid 39524 thread
> > > > > > firefox-bi:cs0 pid 40342
> > > > > > 2024-06-06T10:26:40.747625+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:   in page starting at address
> > > > > > 0x0000800106ffe000 from client 0x12 (VMC)
> > > > > > 2024-06-06T10:26:40.747628+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > MMVM_L2_PROTECTION_FAULT_STATUS:0x00203811
> > > > > > 2024-06-06T10:26:40.747629+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          Faulty UTCL2 client ID: VCN (0x1c)
> > > > > > 2024-06-06T10:26:40.747631+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          MORE_FAULTS: 0x1
> > > > > > 2024-06-06T10:26:40.747651+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          WALKER_ERROR: 0x0
> > > > > > 2024-06-06T10:26:40.747653+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          PERMISSION_FAULTS: 0x1
> > > > > > 2024-06-06T10:26:40.747655+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          MAPPING_ERROR: 0x0
> > > > > > 2024-06-06T10:26:40.747656+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          RW: 0x0
> > > > > > 2024-06-06T10:26:40.747658+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > [mmhub] page fault (src_id:0 ring:8 vmid:2
> > > > > > pasid:32789)
> > > > > > 2024-06-06T10:26:40.747660+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > in process RDD Process pid 39524 thread
> > > > > > firefox-bi:cs0 pid 40342
> > > > > > 2024-06-06T10:26:40.747662+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:   in page starting at address
> > > > > > 0x0000800106e00000 from client 0x12 (VMC)
> > > > > > 2024-06-06T10:26:40.747663+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
> > > > > > 2024-06-06T10:26:40.747664+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          Faulty UTCL2 client ID: MP0 (0x0)
> > > > > > 2024-06-06T10:26:40.747666+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          MORE_FAULTS: 0x0
> > > > > > 2024-06-06T10:26:40.747667+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          WALKER_ERROR: 0x0
> > > > > > 2024-06-06T10:26:40.747668+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          PERMISSION_FAULTS: 0x0
> > > > > > 2024-06-06T10:26:40.747670+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          MAPPING_ERROR: 0x0
> > > > > > 2024-06-06T10:26:40.747671+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          RW: 0x0
> > > > > > 2024-06-06T10:26:40.747674+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > [mmhub] page fault (src_id:0 ring:8 vmid:2
> > > > > > pasid:32789)
> > > > > > 2024-06-06T10:26:40.747677+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > in process RDD Process pid 39524 thread
> > > > > > firefox-bi:cs0 pid 40342
> > > > > > 2024-06-06T10:26:40.747680+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:   in page starting at address
> > > > > > 0x0000800106e07000 from client 0x12 (VMC)
> > > > > > 2024-06-06T10:26:40.747683+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
> > > > > > 2024-06-06T10:26:40.747686+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          Faulty UTCL2 client ID: MP0 (0x0)
> > > > > > 2024-06-06T10:26:40.747688+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          MORE_FAULTS: 0x0
> > > > > > 2024-06-06T10:26:40.747691+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          WALKER_ERROR: 0x0
> > > > > > 2024-06-06T10:26:40.747693+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          PERMISSION_FAULTS: 0x0
> > > > > > 2024-06-06T10:26:40.747696+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          MAPPING_ERROR: 0x0
> > > > > > 2024-06-06T10:26:40.747698+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          RW: 0x0
> > > > > > 2024-06-06T10:26:40.747700+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > [mmhub] page fault (src_id:0 ring:8 vmid:2
> > > > > > pasid:32789)
> > > > > > 2024-06-06T10:26:40.747703+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > in process RDD Process pid 39524 thread
> > > > > > firefox-bi:cs0 pid 40342
> > > > > > 2024-06-06T10:26:40.747705+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:   in page starting at address
> > > > > > 0x0000800107001000 from client 0x12 (VMC)
> > > > > > 2024-06-06T10:26:40.747707+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
> > > > > > 2024-06-06T10:26:40.747710+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          Faulty UTCL2 client ID: MP0 (0x0)
> > > > > > 2024-06-06T10:26:40.747713+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          MORE_FAULTS: 0x0
> > > > > > 2024-06-06T10:26:40.747716+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          WALKER_ERROR: 0x0
> > > > > > 2024-06-06T10:26:40.747718+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          PERMISSION_FAULTS: 0x0
> > > > > > 2024-06-06T10:26:40.747721+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          MAPPING_ERROR: 0x0
> > > > > > 2024-06-06T10:26:40.747723+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:          RW: 0x0
> > > > > > 2024-06-06T10:26:51.094991+08:00 kernel: [drm:amdgpu_job_timedout
> > > > > > [amdgpu]] *ERROR* ring vcn_dec_0 timeout,
> > > > > > signaled seq=24897, emitted seq=24898
> > > > > > 2024-06-06T10:26:51.095023+08:00 kernel: [drm:amdgpu_job_timedout
> > > > > > [amdgpu]] *ERROR* Process information: process
> > > > > > RDD Process pid 39524 thread firefox-bi:cs0 pid 40342
> > > > > > 2024-06-06T10:26:51.095025+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > GPU reset begin!
> > > > > > 2024-06-06T10:26:52.305509+08:00 kernel: [drm] Register(0)
> > > > > > [mmUVD_POWER_STATUS] failed to reach value 0x00000001
> > > > > > != 0x00000002n
> > > > > > 2024-06-06T10:26:52.586019+08:00 kernel: [drm] Register(0)
> > > > > > [mmUVD_RBC_RB_RPTR] failed to reach value 0x000003c0 !=
> > > > > > 0x00000360n
> > > > > > 2024-06-06T10:26:52.639506+08:00 kernel: [drm] Register(0)
> > > > > > [mmUVD_POWER_STATUS] failed to reach value 0x00000001
> > > > > > != 0x00000002n
> > > > > > 2024-06-06T10:26:52.639521+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > MODE2 reset
> > > > > > 2024-06-06T10:26:52.650614+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > GPU reset succeeded, trying to resume
> > > > > > 2024-06-06T10:26:52.650633+08:00 kernel: [drm] PCIE GART of 1024M
> > > > > > enabled (table at 0x000000F41FC00000).
> > > > > > 2024-06-06T10:26:52.650637+08:00 kernel: [drm] VRAM is lost due to
> > > > > > GPU
> > > > > > reset!
> > > > > > 2024-06-06T10:26:52.650641+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > PSP is resuming...
> > > > > > 2024-06-06T10:26:52.673474+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > reserve 0xa00000 from 0xf41e000000 for PSP
> > > > > > TMR
> > > > > > 2024-06-06T10:26:53.001513+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > RAS: optional ras ta ucode is not available
> > > > > > 2024-06-06T10:26:53.013802+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > RAP: optional rap ta ucode is not available
> > > > > > 2024-06-06T10:26:53.013816+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > SECUREDISPLAY: securedisplay ta ucode is not
> > > > > > available
> > > > > > 2024-06-06T10:26:53.013819+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > SMU is resuming...
> > > > > > 2024-06-06T10:26:53.016519+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > SMU is resumed successfully!
> > > > > > 2024-06-06T10:26:53.017502+08:00 kernel: [drm] DMUB hardware
> > > > > > initialized: version=0x04000044
> > > > > > 2024-06-06T10:26:53.677511+08:00 kernel: [drm] kiq ring mec 2 pipe
> > > > > > 1 q
> > > > > > 0
> > > > > > 2024-06-06T10:26:53.958512+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
> > > > > > vcn_dec_0 test failed (-110)
> > > > > > 2024-06-06T10:26:53.958536+08:00 kernel:
> > > > > > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP
> > > > > > block
> > > > > > <vcn_v3_0> failed -110
> > > > > > 2024-06-06T10:26:53.958539+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > GPU reset(1) failed
> > > > > > 2024-06-06T10:26:53.958541+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > GPU reset end with ret = -110
> > > > > > 2024-06-06T10:26:53.959180+08:00 kernel: [drm:amdgpu_job_timedout
> > > > > > [amdgpu]] *ERROR* GPU Recovery Failed: -110
> > > > > > 2024-06-06T10:26:55.261509+08:00 kernel: [drm] Register(0)
> > > > > > [mmUVD_POWER_STATUS] failed to reach value 0x00000001
> > > > > > != 0x00000002n
> > > > > > 2024-06-06T10:26:55.540507+08:00 kernel: [drm] Register(0)
> > > > > > [mmUVD_RBC_RB_RPTR] failed to reach value 0x00000010 !=
> > > > > > 0x00000000n
> > > > > > 2024-06-06T10:27:04.407149+08:00 kernel: [drm] Register(0)
> > > > > > [mmUVD_POWER_STATUS] failed to reach value 0x00000001
> > > > > > != 0x00000002n
> > > > > > 2024-06-06T10:27:04.407252+08:00 kernel: [drm:amdgpu_job_timedout
> > > > > > [amdgpu]] *ERROR* ring vcn_dec_0 timeout,
> > > > > > signaled seq=24898, emitted seq=24898
> > > > > > 2024-06-06T10:27:04.407257+08:00 kernel: [drm:amdgpu_job_timedout
> > > > > > [amdgpu]] *ERROR* Process information: process
> > > > > > RDD Process pid 39524 thread firefox-bi:cs0 pid 40342
> > > > > > 2024-06-06T10:27:04.407259+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > GPU reset begin!
> > > > > > 2024-06-06T10:27:05.033745+08:00 kernel: ------------[ cut here ]-
> > > > > > ----
> > > > > > -------
> > > > > > 2024-06-06T10:27:05.033773+08:00 kernel: WARNING: CPU: 8 PID:
> > > > > > 47039 at
> > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:630
> > > > > > amdgpu_irq_put+0x9c/0xb0 [amdgpu]
> > > > > > 2024-06-06T10:27:05.033777+08:00 kernel: Modules linked in:
> > > > > > nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> > > > > > nft_reject xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat
> > > > > > nf_conntrack_netlink nf_conntrack nf_defrag_ipv6
> > > > > > nf_defrag_ipv4 xt_addrtype nft_compat nf_tables libcrc32c
> > > > > > br_netfilter
> > > > > > bridge stp llc hid_logitech_hidpp usbhid
> > > > > > xfrm_interface xfrm6_tunnel tunnel4 tunnel6 xfrm_user xfrm_algo
> > > > > > uhid
> > > > > > rfcomm snd_seq_dummy snd_hrtimer cmac
> > > > > > algif_hash algif_skcipher af_alg overlay qrtr bnep binfmt_misc
> > > > > > uvcvideo videobuf2_vmalloc uvc videobuf2_memops
> > > > > > videobuf2_v4l2 btusb btrtl videodev btintel btbcm
> > > > > > snd_acp6x_pdm_dma
> > > > > > snd_soc_dmic snd_soc_acp6x_mach amd_atl
> > > > > > intel_rapl_msr btmtk videobuf2_common bluetooth mc
> > > > > > intel_rapl_common
> > > > > > snd_sof_amd_acp63 snd_sof_amd_vangogh
> > > > > > snd_sof_amd_rembrandt iwlmvm snd_sof_amd_renoir snd_sof_amd_acp
> > > > > > snd_sof_pci snd_sof_xtensa_dsp amdgpu snd_sof
> > > > > > edac_mce_amd mac80211 snd_sof_utils snd_pci_ps
> > > > > > snd_hda_codec_realtek
> > > > > > snd_amd_sdw_acpi kvm_amd soundwire_amd
> > > > > > snd_hda_codec_generic soundwire_generic_allocation soundwire_bus
> > > > > > 2024-06-06T10:27:05.033782+08:00 kernel: 
> > > > > > snd_hda_scodec_cs35l41_spi
> > > > > > nls_iso8859_1 snd_hda_codec_hdmi
> > > > > > snd_hda_scodec_component libarc4 kvm snd_soc_core snd_hda_intel
> > > > > > snd_ctl_led snd_intel_dspcfg snd_compress
> > > > > > snd_intel_sdw_acpi amdxcp snd_seq_midi ac97_bus crct10dif_pclmul
> > > > > > drm_exec snd_hda_codec polyval_clmulni
> > > > > > snd_pcm_dmaengine snd_seq_midi_event gpu_sched polyval_generic
> > > > > > iwlwifi
> > > > > > ghash_clmulni_intel snd_rpl_pci_acp6x
> > > > > > drm_buddy sha256_ssse3 snd_hda_core snd_rawmidi snd_acp_pci
> > > > > > drm_suballoc_helper snd_hda_scodec_cs35l41_i2c
> > > > > > sha1_ssse3 drm_ttm_helper snd_acp_legacy_common snd_hwdep
> > > > > > snd_hda_scodec_cs35l41 aesni_intel snd_pci_acp6x amd_pmf
> > > > > > snd_hda_cs_dsp_ctls ttm crypto_simd snd_pci_acp5x
> > > > > > snd_soc_cs_amp_lib
> > > > > > asus_nb_wmi cs_dsp cryptd amdtee snd_seq
> > > > > > snd_rn_pci_acp3x drm_display_helper snd_pcm asus_wmi
> > > > > > snd_acp_config
> > > > > > rapl wmi_bmof sparse_keymap snd_seq_device
> > > > > > cfg80211 snd_soc_cs35l41_lib cec snd_soc_acpi ccp rc_core
> > > > > > snd_timer
> > > > > > i2c_algo_bit i2c_piix4 snd_pci_acp3x k10temp
> > > > > > amd_sfh tee snd platform_profile soundcore
> > > > > > serial_multi_instantiate
> > > > > > amd_pmc acpi_tad
> > > > > > 2024-06-06T10:27:05.033784+08:00 kernel:  joydev input_leds
> > > > > > mac_hid
> > > > > > serio_raw parport_pc ppdev lp parport
> > > > > > efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4
> > > > > > hid_multitouch nvme video ucsi_acpi hid_generic
> > > > > > crc32_pclmul nvme_core typec_ucsi xhci_pci i2c_hid_acpi
> > > > > > xhci_pci_renesas nvme_auth typec i2c_hid wmi hid 8250_dw
> > > > > > 2024-06-06T10:27:05.033785+08:00 kernel: CPU: 8 PID: 47039 Comm:
> > > > > > kworker/u64:0 Tainted: G        W
> > > > > > 6.10.0-061000rc2-generic #202406022333
> > > > > > 2024-06-06T10:27:05.033787+08:00 kernel: Hardware name: ASUSTeK
> > > > > > COMPUTER INC. Zenbook UM5302TA_UM5302TA/UM5302TA,
> > > > > > BIOS UM5302TA.311 01/17/2023
> > > > > > 2024-06-06T10:27:05.033788+08:00 kernel: Workqueue: amdgpu-reset-
> > > > > > dev
> > > > > > drm_sched_job_timedout [gpu_sched]
> > > > > > 2024-06-06T10:27:05.033789+08:00 kernel: RIP:
> > > > > > 0010:amdgpu_irq_put+0x9c/0xb0 [amdgpu]
> > > > > > 2024-06-06T10:27:05.033790+08:00 kernel: Code: 31 f6 31 ff e9 c0
> > > > > > 05 2f
> > > > > > e6 44 89 e2 48 89 de 4c 89 f7 e8 97 fc ff
> > > > > > ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 9f 05 2f e6 <0f>
> > > > > > 0b b8
> > > > > > ea ff ff ff eb c3 b8 fe ff ff ff eb bc 0f
> > > > > > 1f 40 00 90 90
> > > > > > 2024-06-06T10:27:05.033791+08:00 kernel: RSP:
> > > > > > 0018:ffffb65847227c18
> > > > > > EFLAGS: 00010246
> > > > > > 2024-06-06T10:27:05.033793+08:00 kernel: RAX: 0000000000000000
> > > > > > RBX:
> > > > > > ffff9ac0a0280c60 RCX: 0000000000000000
> > > > > > 2024-06-06T10:27:05.033794+08:00 kernel: RDX: 0000000000000000
> > > > > > RSI:
> > > > > > 0000000000000000 RDI: 0000000000000000
> > > > > > 2024-06-06T10:27:05.033796+08:00 kernel: RBP: ffffb65847227c38
> > > > > > R08:
> > > > > > 0000000000000000 R09: 0000000000000000
> > > > > > 2024-06-06T10:27:05.033797+08:00 kernel: R10: 0000000000000000
> > > > > > R11:
> > > > > > 0000000000000000 R12: 0000000000000000
> > > > > > 2024-06-06T10:27:05.033798+08:00 kernel: R13: 0000000000000001
> > > > > > R14:
> > > > > > ffff9ac0a0280000 R15: ffff9ac0a0280000
> > > > > > 2024-06-06T10:27:05.033799+08:00 kernel: FS: 
> > > > > > 0000000000000000(0000)
> > > > > > GS:ffff9ac38e600000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > 2024-06-06T10:27:05.033800+08:00 kernel: CS:  0010 DS: 0000 ES:
> > > > > > 0000
> > > > > > CR0: 0000000080050033
> > > > > > 2024-06-06T10:27:05.033802+08:00 kernel: CR2: 00007d1a5edfe000
> > > > > > CR3:
> > > > > > 000000001863c000 CR4: 0000000000f50ef0
> > > > > > 2024-06-06T10:27:05.033803+08:00 kernel: PKRU: 55555554
> > > > > > 2024-06-06T10:27:05.033805+08:00 kernel: Call Trace:
> > > > > > 2024-06-06T10:27:05.033806+08:00 kernel:  <TASK>
> > > > > > 2024-06-06T10:27:05.033807+08:00 kernel:  ? show_regs+0x6c/0x80
> > > > > > 2024-06-06T10:27:05.033845+08:00 kernel:  ? __warn+0x88/0x140
> > > > > > 2024-06-06T10:27:05.034598+08:00 kernel:  ?
> > > > > > amdgpu_irq_put+0x9c/0xb0
> > > > > > [amdgpu]
> > > > > > 2024-06-06T10:27:05.034615+08:00 kernel:  ? report_bug+0x182/0x1b0
> > > > > > 2024-06-06T10:27:05.034618+08:00 kernel:  ? handle_bug+0x51/0xa0
> > > > > > 2024-06-06T10:27:05.034619+08:00 kernel:  ?
> > > > > > exc_invalid_op+0x18/0x80
> > > > > > 2024-06-06T10:27:05.034620+08:00 kernel:  ?
> > > > > > asm_exc_invalid_op+0x1b/0x20
> > > > > > 2024-06-06T10:27:05.034621+08:00 kernel:  ?
> > > > > > amdgpu_irq_put+0x9c/0xb0
> > > > > > [amdgpu]
> > > > > > 2024-06-06T10:27:05.034623+08:00 kernel:  ?
> > > > > > amdgpu_irq_put+0x55/0xb0
> > > > > > [amdgpu]
> > > > > > 2024-06-06T10:27:05.035573+08:00 kernel: 
> > > > > > gmc_v10_0_hw_fini+0x67/0xe0
> > > > > > [amdgpu]
> > > > > > 2024-06-06T10:27:05.035580+08:00 kernel: 
> > > > > > gmc_v10_0_suspend+0xe/0x20
> > > > > > [amdgpu]
> > > > > > 2024-06-06T10:27:05.035581+08:00 kernel:
> > > > > > amdgpu_device_ip_suspend_phase2+0x251/0x480 [amdgpu]
> > > > > > 2024-06-06T10:27:05.035582+08:00 kernel:
> > > > > > amdgpu_device_ip_suspend+0x49/0x80 [amdgpu]
> > > > > > 2024-06-06T10:27:05.036529+08:00 kernel:
> > > > > > amdgpu_device_pre_asic_reset+0xd1/0x490 [amdgpu]
> > > > > > 2024-06-06T10:27:05.036546+08:00 kernel:
> > > > > > amdgpu_device_gpu_recover+0x406/0xa30 [amdgpu]
> > > > > > 2024-06-06T10:27:05.036548+08:00 kernel:
> > > > > > amdgpu_job_timedout+0x141/0x200 [amdgpu]
> > > > > > 2024-06-06T10:27:05.036550+08:00 kernel:
> > > > > > drm_sched_job_timedout+0x70/0x110 [gpu_sched]
> > > > > > 2024-06-06T10:27:05.036551+08:00 kernel: 
> > > > > > process_one_work+0x186/0x3e0
> > > > > > 2024-06-06T10:27:05.036552+08:00 kernel: 
> > > > > > worker_thread+0x304/0x440
> > > > > > 2024-06-06T10:27:05.036554+08:00 kernel:  ?
> > > > > > srso_alias_return_thunk+0x5/0xfbef5
> > > > > > 2024-06-06T10:27:05.036555+08:00 kernel:  ?
> > > > > > _raw_spin_lock_irqsave+0xe/0x20
> > > > > > 2024-06-06T10:27:05.036556+08:00 kernel:  ?
> > > > > > __pfx_worker_thread+0x10/0x10
> > > > > > 2024-06-06T10:27:05.036557+08:00 kernel:  kthread+0xe4/0x110
> > > > > > 2024-06-06T10:27:05.036558+08:00 kernel:  ?
> > > > > > __pfx_kthread+0x10/0x10
> > > > > > 2024-06-06T10:27:05.036559+08:00 kernel:  ret_from_fork+0x47/0x70
> > > > > > 2024-06-06T10:27:05.036561+08:00 kernel:  ?
> > > > > > __pfx_kthread+0x10/0x10
> > > > > > 2024-06-06T10:27:05.036562+08:00 kernel: 
> > > > > > ret_from_fork_asm+0x1a/0x30
> > > > > > 2024-06-06T10:27:05.036563+08:00 kernel:  </TASK>
> > > > > > 2024-06-06T10:27:05.036564+08:00 kernel: ---[ end trace
> > > > > > 0000000000000000 ]---
> > > > > > 2024-06-06T10:27:05.036565+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > MODE2 reset
> > > > > > 2024-06-06T10:27:05.046502+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > GPU reset succeeded, trying to resume
> > > > > > 2024-06-06T10:27:05.047516+08:00 kernel: [drm] PCIE GART of 1024M
> > > > > > enabled (table at 0x000000F41FC00000).
> > > > > > 2024-06-06T10:27:05.047533+08:00 kernel: [drm] VRAM is lost due to
> > > > > > GPU
> > > > > > reset!
> > > > > > 2024-06-06T10:27:05.047538+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > PSP is resuming...
> > > > > > 2024-06-06T10:27:05.070481+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > reserve 0xa00000 from 0xf41e000000 for PSP
> > > > > > TMR
> > > > > > 2024-06-06T10:27:05.397519+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > RAS: optional ras ta ucode is not available
> > > > > > 2024-06-06T10:27:05.409509+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > RAP: optional rap ta ucode is not available
> > > > > > 2024-06-06T10:27:05.409517+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > SECUREDISPLAY: securedisplay ta ucode is not
> > > > > > available
> > > > > > 2024-06-06T10:27:05.409518+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > SMU is resuming...
> > > > > > 2024-06-06T10:27:05.411482+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > SMU is resumed successfully!
> > > > > > 2024-06-06T10:27:05.413504+08:00 kernel: [drm] DMUB hardware
> > > > > > initialized: version=0x04000044
> > > > > > 2024-06-06T10:27:06.055474+08:00 kernel: [drm] kiq ring mec 2 pipe
> > > > > > 1 q
> > > > > > 0
> > > > > > 2024-06-06T10:27:06.335476+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
> > > > > > vcn_dec_0 test failed (-110)
> > > > > > 2024-06-06T10:27:06.335495+08:00 kernel:
> > > > > > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP
> > > > > > block
> > > > > > <vcn_v3_0> failed -110
> > > > > > 2024-06-06T10:27:06.335498+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > GPU reset(2) failed
> > > > > > 2024-06-06T10:27:06.335499+08:00 kernel: amdgpu 0000:03:00.0:
> > > > > > amdgpu:
> > > > > > GPU reset end with ret = -110
> > > > > > 2024-06-06T10:27:06.335631+08:00 kernel: [drm:amdgpu_job_timedout
> > > > > > [amdgpu]] *ERROR* GPU Recovery Failed: -110
> > 
> > I'm limited by time and computing power, can Winston do a bisect?
> > 
> > If Winston can't I can do a bisect, but don't expect results before
> > days...
> > I've only got this laptop and it's heavily used, so it really takes time.
> > 
> > Best,
> > Yunchen

I'm switched to some rather obscure setup, so sorry I couldn't test the
kernel. If it seemed okay on your side, then maybe this problem is now fixed,
considering some APU-specific fixes pulled from drm-next during the weeks.

If there aren't any more case reports for the next few days then maybe we can
regard this as resolved?

Best,
Yunchen


More information about the amd-gfx mailing list