amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

Mikhail Gavrilov mikhail.v.gavrilov at gmail.com
Tue Dec 19 09:45:43 UTC 2023


On Fri, Dec 15, 2023 at 5:37 PM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> I have no idea :)
>
>  From the logs I can see that the AMDGPU now has the proper BARs assigned:
>
> [    5.722015] pci 0000:03:00.0: [1002:73df] type 00 class 0x038000
> [    5.722051] pci 0000:03:00.0: reg 0x10: [mem
> 0xf800000000-0xfbffffffff 64bit pref]
> [    5.722081] pci 0000:03:00.0: reg 0x18: [mem
> 0xfc00000000-0xfc0fffffff 64bit pref]
> [    5.722112] pci 0000:03:00.0: reg 0x24: [mem 0xfca00000-0xfcafffff]
> [    5.722134] pci 0000:03:00.0: reg 0x30: [mem 0xfcb00000-0xfcb1ffff pref]
> [    5.722368] pci 0000:03:00.0: PME# supported from D1 D2 D3hot D3cold
> [    5.722484] pci 0000:03:00.0: 63.008 Gb/s available PCIe bandwidth,
> limited by 8.0 GT/s PCIe x8 link at 0000:00:01.1 (capable of 252.048
> Gb/s with 16.0 GT/s PCIe x16 link)
>
> And with that the driver can work perfectly fine.
>
> Have you updated the BIOS or added/removed some other hardware? Maybe
> somebody added a quirk for your BIOS into the PCIe code or something
> like that.

No, nothing changed in hardware.
But I found the commit which fixes it.

> git bisect unfixed
92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6 is the first fixed commit
commit 92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6
Author: Vasant Hegde <vasant.hegde at amd.com>
Date:   Thu Sep 21 09:21:45 2023 +0000

    iommu/amd: Introduce iommu_dev_data.flags to track device capabilities

    Currently we use struct iommu_dev_data.iommu_v2 to keep track of the device
    ATS, PRI, and PASID capabilities. But these capabilities can be enabled
    independently (except PRI requires ATS support). Hence, replace
    the iommu_v2 variable with a flags variable, which keep track of the device
    capabilities.

    From commit 9bf49e36d718 ("PCI/ATS: Handle sharing of PF PRI Capability
    with all VFs"), device PRI/PASID is shared between PF and any associated
    VFs. Hence use pci_pri_supported() and pci_pasid_features() instead of
    pci_find_ext_capability() to check device PRI/PASID support.

    Signed-off-by: Vasant Hegde <vasant.hegde at amd.com>
    Reviewed-by: Jason Gunthorpe <jgg at nvidia.com>
    Reviewed-by: Jerry Snitselaar <jsnitsel at redhat.com>
    Link: https://lore.kernel.org/r/20230921092147.5930-13-vasant.hegde@amd.com
    Signed-off-by: Joerg Roedel <jroedel at suse.de>

 drivers/iommu/amd/amd_iommu_types.h |  3 ++-
 drivers/iommu/amd/iommu.c           | 46 ++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 19 deletions(-)


> git bisect log
git bisect start '--term-new=fixed' '--term-old=unfixed'
# status: waiting for both good and bad commits
# fixed: [33cc938e65a98f1d29d0a18403dbbee050dcad9a] Linux 6.7-rc4
git bisect fixed 33cc938e65a98f1d29d0a18403dbbee050dcad9a
# status: waiting for good commit(s), bad commit known
# unfixed: [ffc253263a1375a65fa6c9f62a893e9767fbebfa] Linux 6.6
git bisect unfixed ffc253263a1375a65fa6c9f62a893e9767fbebfa
# unfixed: [7d461b291e65938f15f56fe58da2303b07578a76] Merge tag
'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm
git bisect unfixed 7d461b291e65938f15f56fe58da2303b07578a76
# unfixed: [e14aec23025eeb1f2159ba34dbc1458467c4c347] s390/ap: fix AP
bus crash on early config change callback invocation
git bisect unfixed e14aec23025eeb1f2159ba34dbc1458467c4c347
# unfixed: [be3ca57cfb777ad820c6659d52e60bbdd36bf5ff] Merge tag
'media/v6.7-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect unfixed be3ca57cfb777ad820c6659d52e60bbdd36bf5ff
# fixed: [c0d12d769299e1e08338988c7745009e0db2a4a0] Merge tag
'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm
git bisect fixed c0d12d769299e1e08338988c7745009e0db2a4a0
# fixed: [4bbdb725a36b0d235f3b832bd0c1e885f0442d9f] Merge tag
'iommu-updates-v6.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
git bisect fixed 4bbdb725a36b0d235f3b832bd0c1e885f0442d9f
# unfixed: [25b6377007ebe1c3ede773fd6979f613386db000] Merge tag
'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm
git bisect unfixed 25b6377007ebe1c3ede773fd6979f613386db000
# unfixed: [67c0afb6424fee94238d9a32b97c407d0c97155e] Merge tag
'exfat-for-6.7-rc1-part2' of
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
git bisect unfixed 67c0afb6424fee94238d9a32b97c407d0c97155e
# unfixed: [3613047280ec42a4e1350fdc1a6dd161ff4008cc] Merge tag
'v6.6-rc7' into core
git bisect unfixed 3613047280ec42a4e1350fdc1a6dd161ff4008cc
# fixed: [cedc811c76778bdef91d405717acee0de54d8db5] iommu/amd: Remove
DMA_FQ type from domain allocation path
git bisect fixed cedc811c76778bdef91d405717acee0de54d8db5
# unfixed: [b0cc5dae1ac0c18748706a4beb636e3b726dd744] iommu/amd:
Rename ats related variables
git bisect unfixed b0cc5dae1ac0c18748706a4beb636e3b726dd744
# fixed: [5a0b11a180a9b82b4437a4be1cf73530053f139b] iommu/amd: Remove
iommu_v2 module
git bisect fixed 5a0b11a180a9b82b4437a4be1cf73530053f139b
# fixed: [92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6] iommu/amd:
Introduce iommu_dev_data.flags to track device capabilities
git bisect fixed 92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6
# unfixed: [739eb25514c90aa8ea053ed4d2b971f531e63ded] iommu/amd:
Introduce iommu_dev_data.ppr
git bisect unfixed 739eb25514c90aa8ea053ed4d2b971f531e63ded
# first fixed commit: [92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6]
iommu/amd: Introduce iommu_dev_data.flags to track device capabilities

-- 
Best Regards,
Mike Gavrilov.


More information about the dri-devel mailing list