amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

Mikhail Gavrilov mikhail.v.gavrilov at gmail.com
Thu Feb 23 23:40:49 UTC 2023


Hi,
I have a laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007. But
it is impossible to use without AC power because the system losts nvme
when I disconnect the power adapter.

Messages from kernel log when it happens:
nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
nvme nvme0: Does your device have a faulty power saving mode enabled?
nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off"
and report a bug

I tried to use recommended parameters
(nvme_core.default_ps_max_latency_us=0 and pcie_aspm=off) to resolve
this issue, but without successed.

In the linux-nvme mail list the last advice was to try the "pci=nocrs"
parameter.

But with this parameter the amdgpu driver refuses to work and makes
the system unbootable. I can solve the problem with the booting system
by blacklisting the driver but it is not a good solution, because I
don't wanna lose the GPU.

Why amdgpu not work with "pci=nocrs" ?
And is it possible to solve this incompatibility?
It is very important because when I boot the system without amdgpu
driver with "pci=nocrs" nvme is not losts when I disconnect the power
adapter. So "pci=nocrs" really helps.

Below that I see in kernel log when adds "pci=nocrs" parameter:

amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ATRM
amdgpu: ATOM BIOS: SWBRT77321.001
[drm] VCN(0) decode is enabled in VM mode
[drm] VCN(0) encode is enabled in VM mode
[drm] JPEG decode is enabled in VM mode
Console: switching to colour dummy device 80x25
amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature
disabled as experimental (default)
[drm] GPU posting now...
[drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment
size is 9-bit
amdgpu 0000:03:00.0: amdgpu: VRAM: 12272M 0x0000008000000000 -
0x00000082FEFFFFFF (12272M used)
amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 -
0x0000FFFFFFFFFFFF
[drm] Detected VRAM RAM=12272M, BAR=16384M
[drm] RAM width 192bits GDDR6
[drm] amdgpu: 12272M of VRAM memory ready
[drm] amdgpu: 31774M of GTT memory ready.
amdgpu 0000:03:00.0: amdgpu: (-14) failed to allocate kernel bo
[drm] Debug VRAM access will use slowpath MM access
amdgpu 0000:03:00.0: amdgpu: Failed to DMA MAP the dummy page
[drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block
<gmc_v10_0> failed -12
amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.

Of course a full system log is also attached.

-- 
Best Regards,
Mike Gavrilov.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: system-log-Fatal-error-during-GPU-init.tar.xz
Type: application/x-xz
Size: 40988 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230224/86526758/attachment-0001.xz>


More information about the amd-gfx mailing list