[Bug 213145] AMDGPU resets, timesout and crashes after "*ERROR* Waiting for fences timed out!"

bugzilla-daemon at kernel.org bugzilla-daemon at kernel.org
Fri Sep 30 15:00:10 UTC 2022


https://bugzilla.kernel.org/show_bug.cgi?id=213145

Taras (halturin at gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |halturin at gmail.com

--- Comment #22 from Taras (halturin at gmail.com) ---
Experiencing the same issue on 5.19.11 (NixOS 22.11pre411613.7e52b35fe98) with
RX 6800. Random freezing when I use vivaldi browser. 


 vivaldi-stable.desktop[49450]:
[49444:49444:0930/100113.311398:ERROR:CONSOLE(0)] "Uncaught (in promise) Error:
A listener indicated an asynchronous response by returning true>
 vivaldi-stable.desktop[49450]:
[49444:49444:0930/100116.501866:ERROR:CONSOLE(0)] "Uncaught (in promise) Error:
A listener indicated an asynchronous response by returning true>
 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences
timed out!
 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma3 timeout,
signaled seq=114786, emitted seq=114788
 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
 kernel: amdgpu 0000:4c:00.0: amdgpu: GPU reset begin!
 kernel: amdgpu 0000:4c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
ring kiq_2.1.0 test failed (-110)
 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
 kernel: [drm] free PSP TMR buffer
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e3bb00 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e22300 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e30c00 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e16000 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e38600 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e2ea00 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e3d000 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e37700 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e32400 flags=0x0010]
 kernel: amdgpu 0000:4c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0038
address=0xf7d00e31c00 flags=0x0010]
 kernel: CPU: 12 PID: 96188 Comm: kworker/u256:1 Tainted: G        W        
5.19.11 #1-NixOS
 kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO
WIFI (MS-7C60), BIOS 2.80 05/17/2022
 kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
 kernel: Call Trace:
 kernel:  <TASK>
 kernel:  dump_stack_lvl+0x45/0x5e
 kernel:  amdgpu_do_asic_reset+0x28/0x438 [amdgpu]
 kernel:  amdgpu_device_gpu_recover_imp.cold+0x5ad/0x90a [amdgpu]
 kernel:  amdgpu_job_timedout+0x153/0x190 [amdgpu]
 kernel:  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
 kernel:  process_one_work+0x1e5/0x3b0
 kernel:  worker_thread+0x50/0x3a0
 kernel:  ? rescuer_thread+0x390/0x390
 kernel:  kthread+0xe8/0x110
 kernel:  ? kthread_complete_and_exit+0x20/0x20
 kernel:  ret_from_fork+0x22/0x30
 kernel:  </TASK>
 kernel: amdgpu 0000:4c:00.0: amdgpu: MODE1 reset
 kernel: amdgpu 0000:4c:00.0: amdgpu: GPU mode1 reset
 kernel: amdgpu 0000:4c:00.0: amdgpu: GPU smu mode1 reset
 kernel: amdgpu 0000:4c:00.0: amdgpu: GPU reset succeeded, trying to resume
 kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
 kernel: [drm] VRAM is lost due to GPU reset!
 kernel: [drm] PSP is resuming...
 kernel: [drm] reserve 0xa00000 from 0x83fe000000 for PSP TMR
 kernel: amdgpu 0000:4c:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is
not available
 kernel: amdgpu 0000:4c:00.0: amdgpu: SMU is resuming...
 kernel: amdgpu 0000:4c:00.0: amdgpu: smu driver if version = 0x00000040, smu
fw if version = 0x00000041, smu fw program = 0, version = 0x003a5400 (58.84.0)
 kernel: amdgpu 0000:4c:00.0: amdgpu: SMU driver if version not matched
 kernel: amdgpu 0000:4c:00.0: amdgpu: use vbios provided pptable
 kernel: amdgpu 0000:4c:00.0: amdgpu: SMU is resumed successfully!
 kernel: [drm] DMUB hardware initialized: version=0x02020013
 kernel: [drm] kiq ring mec 2 pipe 1 q 0
 kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
 kernel: [drm] JPEG decode initialized successfully.
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub
0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub
1
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub
1
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub
1
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub
1
 kernel: amdgpu 0000:4c:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
 kernel: amdgpu 0000:4c:00.0: amdgpu: recover vram bo from shadow start
 kernel: amdgpu 0000:4c:00.0: amdgpu: recover vram bo from shadow done
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: amdgpu 0000:4c:00.0: amdgpu: GPU reset(1) succeeded!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm] Skip scheduling IBs!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser
-125!
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 vivaldi-stable.desktop[49450]:
[49657:49664:0930/100759.348288:ERROR:display.cc(286)] Frame latency is
negative: -210.699 ms
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[3076]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: amdgpu_cs_query_fence_status failed.
 org.gnome.Totem[67100]: amdgpu: The CS has been cancelled because the context
is lost.
 org.gnome.Totem[67100]: amdgpu: The CS has been cancelled because the context
is lost.
 org.gnome.Totem[67100]: amdgpu: The CS has been cancelled because the context
is lost.
 org.gnome.Totem[67100]: amdgpu: The CS has been cancelled because the context
is lost.
 org.gnome.Totem[67100]: amdgpu: The CS has been cancelled because the context
is lost.
 org.gnome.Totem[67100]: amdgpu: The CS has been cancelled because the context
is lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.
 gnome-shell[2555]: amdgpu: The CS has been cancelled because the context is
lost.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list