[Bug 216173] New: amdgpu [gfxhub] page fault (src_id:0 ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid 3237)
bugzilla-daemon at kernel.org
bugzilla-daemon at kernel.org
Sat Jun 25 23:52:53 UTC 2022
https://bugzilla.kernel.org/show_bug.cgi?id=216173
Bug ID: 216173
Summary: amdgpu [gfxhub] page fault (src_id:0 ring:173 vmid:1
pasid:32769, for process Xorg pid 2994 thread Xorg:cs0
pid 3237)
Product: Drivers
Version: 2.5
Kernel Version: 5.19-rc3
Hardware: i386
OS: Linux
Tree: Mainline
Status: NEW
Severity: high
Priority: P1
Component: Video(DRI - non Intel)
Assignee: drivers_video-dri at kernel-bugs.osdl.org
Reporter: witold.baryluk+kernel at gmail.com
Regression: No
This appears to be a regression in 5.19-rc3 (and rc2, didn't test before that).
It works fine on 5.18.7. Both custom build. And also no issues on 5.18.0.
Debian, amd64.
44:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi
21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c0)
CPU: AMD Threadripper 2950X, stock
Memory: 8x32GB ECC
Motherboard: MSI MEG Creation X399
Booting looks fine, but when Xorg server starts, the screen looks corrupted,
and it takes seconds until screen freezes and is not responding.
Dmesg output:
[ 140.683672] amdgpu 0000:44:00.0: amdgpu: [gfxhub] page fault (src_id:0
ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid
3237)
[ 140.683678] amdgpu 0000:44:00.0: amdgpu: in page starting at address
0x0000800106ef5000 from client 0x1b (UTCL2)
[ 140.683681] amdgpu 0000:44:00.0: amdgpu:
GCVM_L2_PROTECTION_FAULT_STATUS:0x0014115B
[ 140.683682] amdgpu 0000:44:00.0: amdgpu: Faulty UTCL2 client ID: TCP
(0x8)
[ 140.683684] amdgpu 0000:44:00.0: amdgpu: MORE_FAULTS: 0x1
[ 140.683685] amdgpu 0000:44:00.0: amdgpu: WALKER_ERROR: 0x5
[ 140.683686] amdgpu 0000:44:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 140.683686] amdgpu 0000:44:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 140.683687] amdgpu 0000:44:00.0: amdgpu: RW: 0x1
...
[ 151.015508] gmc_v10_0_process_interrupt: 699 callbacks suppressed
...
Eventually resets, but still not usable:
[ 161.261520] amdgpu 0000:44:00.0: amdgpu: IH ring buffer overflow
(0x0008D620, 0x00002680, 0x0000D640)
[ 161.270648] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0
timeout, signaled seq=100, emitted seq=103
[ 161.270854] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 2994 thread Xorg:cs0 pid 3237
[ 161.271004] amdgpu 0000:44:00.0: amdgpu: GPU reset begin!
[ 161.830407] amdgpu 0000:44:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[ 161.830517] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[ 162.084366] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[ 162.101328] [drm] free PSP TMR buffer
[ 162.149879] CPU: 15 PID: 188 Comm: kworker/u128:14 Tainted: G W E
5.19.0-rc3 #1
[ 162.149883] Hardware name: Micro-Star International Co., Ltd. MS-7B92/MEG
X399 CREATION (MS-7B92), BIOS 1.30 03/25/2019
[ 162.149884] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[ 162.149890] Call Trace:
[ 162.149892] <TASK>
[ 162.149893] dump_stack_lvl+0x34/0x45
[ 162.149898] amdgpu_do_asic_reset+0x1b/0x3db [amdgpu]
[ 162.150047] amdgpu_device_gpu_recover_imp.cold+0x57e/0x910 [amdgpu]
[ 162.150194] amdgpu_job_timedout+0x14b/0x180 [amdgpu]
[ 162.150323] ? finish_task_switch.isra.0+0x7d/0x270
[ 162.150326] drm_sched_job_timedout+0x5b/0xf0 [gpu_sched]
[ 162.150330] process_one_work+0x1ab/0x300
[ 162.150332] worker_thread+0x48/0x3c0
[ 162.150334] ? rescuer_thread+0x3c0/0x3c0
[ 162.150336] kthread+0xd1/0x100
[ 162.150338] ? kthread_complete_and_exit+0x20/0x20
[ 162.150339] ret_from_fork+0x1f/0x30
[ 162.150342] </TASK>
[ 162.150351] amdgpu 0000:44:00.0: amdgpu: MODE1 reset
[ 162.150354] amdgpu 0000:44:00.0: amdgpu: GPU mode1 reset
[ 162.150417] amdgpu 0000:44:00.0: amdgpu: GPU smu mode1 reset
[ 162.653371] amdgpu 0000:44:00.0: amdgpu: GPU reset succeeded, trying to
resume
[ 162.653516] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[ 162.653537] [drm] VRAM is lost due to GPU reset!
[ 162.653541] [drm] PSP is resuming...
[ 162.834166] [drm] reserve 0xa00000 from 0x8001000000 for PSP TMR
[ 162.948850] amdgpu 0000:44:00.0: amdgpu: SECUREDISPLAY: securedisplay ta
ucode is not available
[ 162.948853] amdgpu 0000:44:00.0: amdgpu: SMU is resuming...
[ 162.948884] amdgpu 0000:44:00.0: amdgpu: use vbios provided pptable
[ 163.025704] amdgpu 0000:44:00.0: amdgpu: SMU is resumed successfully!
[ 163.027473] [drm] DMUB hardware initialized: version=0x02020003
[ 163.280274] [drm] kiq ring mec 2 pipe 1 q 0
[ 163.284624] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[ 163.284906] [drm] JPEG decode initialized successfully.
[ 163.284926] amdgpu 0000:44:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[ 163.284928] amdgpu 0000:44:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1
on hub 0
[ 163.284930] amdgpu 0000:44:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4
on hub 0
[ 163.284931] amdgpu 0000:44:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5
on hub 0
[ 163.284932] amdgpu 0000:44:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6
on hub 0
[ 163.284934] amdgpu 0000:44:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7
on hub 0
[ 163.284935] amdgpu 0000:44:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8
on hub 0
[ 163.284936] amdgpu 0000:44:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9
on hub 0
[ 163.284937] amdgpu 0000:44:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10
on hub 0
[ 163.284938] amdgpu 0000:44:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11
on hub 0
[ 163.284940] amdgpu 0000:44:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on
hub 0
[ 163.284941] amdgpu 0000:44:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on
hub 0
[ 163.284942] amdgpu 0000:44:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on
hub 0
[ 163.284943] amdgpu 0000:44:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on
hub 0
[ 163.284944] amdgpu 0000:44:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on
hub 1
[ 163.284945] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1
on hub 1
[ 163.284947] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4
on hub 1
[ 163.284948] amdgpu 0000:44:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on
hub 1
[ 163.284949] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6
on hub 1
[ 163.284950] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7
on hub 1
[ 163.284951] amdgpu 0000:44:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on
hub 1
[ 163.292565] amdgpu 0000:44:00.0: amdgpu: recover vram bo from shadow start
[ 163.292579] amdgpu 0000:44:00.0: amdgpu: recover vram bo from shadow done
[ 163.292582] [drm] Skip scheduling IBs!
[ 163.292583] [drm] Skip scheduling IBs!
[ 163.292598] amdgpu 0000:44:00.0: amdgpu: GPU reset(3) succeeded!
[ 163.292618] [drm] Skip scheduling IBs!
[ 163.292626] [drm] Skip scheduling IBs!
[ 163.292629] [drm] Skip scheduling IBs!
[ 163.989966] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
[ 166.265393] amdgpu_cs_ioctl: 3200 callbacks suppressed
[ 166.265397] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[ 166.265812] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[ 166.282284] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[ 166.283327] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[ 171.486759] amdgpu_cs_ioctl: 65 callbacks suppressed
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
More information about the dri-devel
mailing list