[Bug 217514] New: [amdgpu] system doesn't boot after linux-firmware 2023-05-23 ffe1a41e

bugzilla-daemon at kernel.org bugzilla-daemon at kernel.org
Wed May 31 16:35:25 UTC 2023


https://bugzilla.kernel.org/show_bug.cgi?id=217514

            Bug ID: 217514
           Summary: [amdgpu] system doesn't boot after linux-firmware
                    2023-05-23 ffe1a41e
           Product: Drivers
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri at kernel-bugs.osdl.org
          Reporter: rly at hotmail.hu
        Regression: No

Created attachment 304361
  --> https://bugzilla.kernel.org/attachment.cgi?id=304361&action=edit
softlockup

Updating linux-firmware to the latest git version causes my pc to lock up
during boot. I have a 3900x paired with a 7900xtx running arch linux with 6.3.4
xanmod kernel (but this happens with kernel from the core repo as well) and
mesa 23.1.1 if that matters.
During boot time I see the following error printed and the system is completely
locked up, only hard reset helps:
`May 31 07:20:40 valhalla kernel: watchdog: BUG: soft lockup - CPU#5 stuck for
26s! [swapper/5:0]`

accompanied with a lots of amdgpu errors in the journal (followed by stack
trace after both):
```
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub] page
fault (src_id:0 ring:24 vmid:9 pasid:32768, for process  pid 0 thread  pid 0)
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:   in page
starting at address 0x0000ffff0021a000 from client 10
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:
GCVM_L2_PROTECTION_FAULT_STATUS:0x00900831
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:          Faulty
UTCL2 client ID: CPF (0x4)
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:         
MORE_FAULTS: 0x1
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:         
WALKER_ERROR: 0x0
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:         
PERMISSION_FAULTS: 0x3
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:         
MAPPING_ERROR: 0x0
May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:          RW: 0x0

```

full journal log in "softlockup".

The issues start to happen after [this commit,
ffe1a41e](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=ffe1a41e2ddbc39109b12d95dcac282d90eba8fc)

but not the above mentioned soft lock, instead after initramfs loads I get the
bios splash screen back and it's stuck there.
There are different amdgpu errors(followed by stack trace) during this:
```
May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done
with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to enable
requested dpm features!
May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to setup
smc hw!
May 31 09:18:37 valhalla kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR*
hw_init of IP block <smu> failed -62
May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu:
amdgpu_device_ip_init failed
May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Fatal error
during GPU init
May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: amdgpu: finishing
device.
```
Logs during this in "amdgpu_error"

Note that at the end it seems like the system is running but as I only saw the
bios splash screen rebooted via sysrq/reisub.

The commit after ffe1a41
([56832557](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=568325574a3b6148f3296984aa24fcd1fb4b912c)
or might be the one after that
[39dafcc](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=39d6fcc73100ae4aeeec0194bbf102c672673edd),
not sure at the moment) gets past the splash screen but that's where the soft
lockup starts to happen.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list