[Bug 216173] New: amdgpu [gfxhub] page fault (src_id:0 ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid 3237)

Sat Jun 25 23:52:53 UTC 2022

https://bugzilla.kernel.org/show_bug.cgi?id=216173

            Bug ID: 216173
           Summary: amdgpu [gfxhub] page fault (src_id:0 ring:173 vmid:1
                    pasid:32769, for process Xorg pid 2994 thread Xorg:cs0
                    pid 3237)
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.19-rc3
          Hardware: i386
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri at kernel-bugs.osdl.org
          Reporter: witold.baryluk+kernel at gmail.com
        Regression: No

This appears to be a regression in 5.19-rc3 (and rc2, didn't test before that).
It works fine on 5.18.7. Both custom build. And also no issues on 5.18.0.

Debian, amd64.

44:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi
21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c0)

CPU: AMD Threadripper 2950X, stock
Memory: 8x32GB ECC
Motherboard: MSI MEG Creation X399

Booting looks fine, but when Xorg server starts, the screen looks corrupted,
and it takes seconds until screen freezes and is not responding.

Dmesg output:

[  140.683672] amdgpu 0000:44:00.0: amdgpu: [gfxhub] page fault (src_id:0
ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid
3237)
[  140.683678] amdgpu 0000:44:00.0: amdgpu:   in page starting at address
0x0000800106ef5000 from client 0x1b (UTCL2)
[  140.683681] amdgpu 0000:44:00.0: amdgpu:
GCVM_L2_PROTECTION_FAULT_STATUS:0x0014115B
[  140.683682] amdgpu 0000:44:00.0: amdgpu:      Faulty UTCL2 client ID: TCP
(0x8)
[  140.683684] amdgpu 0000:44:00.0: amdgpu:      MORE_FAULTS: 0x1
[  140.683685] amdgpu 0000:44:00.0: amdgpu:      WALKER_ERROR: 0x5
[  140.683686] amdgpu 0000:44:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[  140.683686] amdgpu 0000:44:00.0: amdgpu:      MAPPING_ERROR: 0x1
[  140.683687] amdgpu 0000:44:00.0: amdgpu:      RW: 0x1
...
[  151.015508] gmc_v10_0_process_interrupt: 699 callbacks suppressed
...

Eventually resets, but still not usable:

[  161.261520] amdgpu 0000:44:00.0: amdgpu: IH ring buffer overflow
(0x0008D620, 0x00002680, 0x0000D640)
[  161.270648] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0
timeout, signaled seq=100, emitted seq=103
[  161.270854] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 2994 thread Xorg:cs0 pid 3237
[  161.271004] amdgpu 0000:44:00.0: amdgpu: GPU reset begin!
[  161.830407] amdgpu 0000:44:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[  161.830517] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[  162.084366] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[  162.101328] [drm] free PSP TMR buffer
[  162.149879] CPU: 15 PID: 188 Comm: kworker/u128:14 Tainted: G        W   E  
  5.19.0-rc3 #1
[  162.149883] Hardware name: Micro-Star International Co., Ltd. MS-7B92/MEG
X399 CREATION (MS-7B92), BIOS 1.30 03/25/2019
[  162.149884] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[  162.149890] Call Trace:
[  162.149892]  <TASK>
[  162.149893]  dump_stack_lvl+0x34/0x45
[  162.149898]  amdgpu_do_asic_reset+0x1b/0x3db [amdgpu]
[  162.150047]  amdgpu_device_gpu_recover_imp.cold+0x57e/0x910 [amdgpu]
[  162.150194]  amdgpu_job_timedout+0x14b/0x180 [amdgpu]
[  162.150323]  ? finish_task_switch.isra.0+0x7d/0x270
[  162.150326]  drm_sched_job_timedout+0x5b/0xf0 [gpu_sched]
[  162.150330]  process_one_work+0x1ab/0x300
[  162.150332]  worker_thread+0x48/0x3c0
[  162.150334]  ? rescuer_thread+0x3c0/0x3c0
[  162.150336]  kthread+0xd1/0x100
[  162.150338]  ? kthread_complete_and_exit+0x20/0x20
[  162.150339]  ret_from_fork+0x1f/0x30
[  162.150342]  </TASK>
[  162.150351] amdgpu 0000:44:00.0: amdgpu: MODE1 reset
[  162.150354] amdgpu 0000:44:00.0: amdgpu: GPU mode1 reset
[  162.150417] amdgpu 0000:44:00.0: amdgpu: GPU smu mode1 reset
[  162.653371] amdgpu 0000:44:00.0: amdgpu: GPU reset succeeded, trying to
resume
[  162.653516] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[  162.653537] [drm] VRAM is lost due to GPU reset!
[  162.653541] [drm] PSP is resuming...
[  162.834166] [drm] reserve 0xa00000 from 0x8001000000 for PSP TMR
[  162.948850] amdgpu 0000:44:00.0: amdgpu: SECUREDISPLAY: securedisplay ta
ucode is not available
[  162.948853] amdgpu 0000:44:00.0: amdgpu: SMU is resuming...
[  162.948884] amdgpu 0000:44:00.0: amdgpu: use vbios provided pptable
[  163.025704] amdgpu 0000:44:00.0: amdgpu: SMU is resumed successfully!
[  163.027473] [drm] DMUB hardware initialized: version=0x02020003
[  163.280274] [drm] kiq ring mec 2 pipe 1 q 0
[  163.284624] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[  163.284906] [drm] JPEG decode initialized successfully.
[  163.284926] amdgpu 0000:44:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[  163.284928] amdgpu 0000:44:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1
on hub 0
[  163.284930] amdgpu 0000:44:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4
on hub 0
[  163.284931] amdgpu 0000:44:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5
on hub 0
[  163.284932] amdgpu 0000:44:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6
on hub 0
[  163.284934] amdgpu 0000:44:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7
on hub 0
[  163.284935] amdgpu 0000:44:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8
on hub 0
[  163.284936] amdgpu 0000:44:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9
on hub 0
[  163.284937] amdgpu 0000:44:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10
on hub 0
[  163.284938] amdgpu 0000:44:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11
on hub 0
[  163.284940] amdgpu 0000:44:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on
hub 0
[  163.284941] amdgpu 0000:44:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on
hub 0
[  163.284942] amdgpu 0000:44:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on
hub 0
[  163.284943] amdgpu 0000:44:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on
hub 0
[  163.284944] amdgpu 0000:44:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on
hub 1
[  163.284945] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1
on hub 1
[  163.284947] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4
on hub 1
[  163.284948] amdgpu 0000:44:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on
hub 1
[  163.284949] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6
on hub 1
[  163.284950] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7
on hub 1
[  163.284951] amdgpu 0000:44:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on
hub 1
[  163.292565] amdgpu 0000:44:00.0: amdgpu: recover vram bo from shadow start
[  163.292579] amdgpu 0000:44:00.0: amdgpu: recover vram bo from shadow done
[  163.292582] [drm] Skip scheduling IBs!
[  163.292583] [drm] Skip scheduling IBs!
[  163.292598] amdgpu 0000:44:00.0: amdgpu: GPU reset(3) succeeded!
[  163.292618] [drm] Skip scheduling IBs!
[  163.292626] [drm] Skip scheduling IBs!
[  163.292629] [drm] Skip scheduling IBs!
[  163.989966] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
[  166.265393] amdgpu_cs_ioctl: 3200 callbacks suppressed
[  166.265397] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  166.265812] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  166.282284] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  166.283327] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  171.486759] amdgpu_cs_ioctl: 65 callbacks suppressed

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.