[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79
Nicolas Chauvet
kwizart at gmail.com
Tue Jan 17 15:55:41 UTC 2023
Le dim. 15 janv. 2023 à 22:45, David Airlie <airlied at redhat.com> a écrit :
>
> On Thu, Dec 29, 2022 at 12:58 AM Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt> wrote:
> >
> > Hello,
> >
> > Commit 2541626cfb79 breaks GM20B probe with
> > the following kernel log:
> >
> > [ 2.153892] ------------[ cut here ]------------
> > [ 2.153897] WARNING: CPU: 1 PID: 36 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:273 gf100_vmm_valid+0x2c4/0x390
> > [ 2.153916] Modules linked in:
> > [ 2.153922] CPU: 1 PID: 36 Comm: kworker/u8:1 Not tainted 6.1.0+ #1
> > [ 2.153929] Hardware name: Google Pixel C (DT)
> > [ 2.153933] Workqueue: events_unbound deferred_probe_work_func
> > [ 2.153943] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 2.153950] pc : gf100_vmm_valid+0x2c4/0x390
> > [ 2.153959] lr : gf100_vmm_valid+0xb4/0x390
> > [ 2.153966] sp : ffffffc009e134b0
> > [ 2.153969] x29: ffffffc009e134b0 x28: 0000000000000000 x27: ffffffc008fd44c8
> > [ 2.153979] x26: 00000000ffffffea x25: ffffffc0087b98d0 x24: ffffff8080f89038
> > [ 2.153987] x23: ffffff8081fadc08 x22: 0000000000000000 x21: 0000000000000000
> > [ 2.153995] x20: ffffff8080f8a000 x19: ffffffc009e13678 x18: 0000000000000000
> > [ 2.154003] x17: f37a8b93418958e6 x16: ffffffc009f0d000 x15: 0000000000000000
> > [ 2.154011] x14: 0000000000000002 x13: 000000000003a020 x12: ffffffc008000000
> > [ 2.154019] x11: 0000000102913000 x10: 0000000000000000 x9 : 0000000000000000
> > [ 2.154026] x8 : ffffffc009e136d8 x7 : ffffffc008fd44c8 x6 : ffffff80803d0f00
> > [ 2.154034] x5 : 0000000000000000 x4 : ffffff8080f88c00 x3 : 0000000000000010
> > [ 2.154041] x2 : 000000000000000c x1 : 00000000ffffffea x0 : 00000000ffffffea
> > [ 2.154050] Call trace:
> > [ 2.154053] gf100_vmm_valid+0x2c4/0x390
> > [ 2.154061] nvkm_vmm_map_valid+0xd4/0x204
> > [ 2.154069] nvkm_vmm_map_locked+0xa4/0x344
> > [ 2.154076] nvkm_vmm_map+0x50/0x84
> > [ 2.154083] nvkm_firmware_mem_map+0x84/0xc4
> > [ 2.154094] nvkm_falcon_fw_oneinit+0xc8/0x320
> > [ 2.154101] nvkm_acr_oneinit+0x428/0x5b0
> > [ 2.154109] nvkm_subdev_oneinit_+0x50/0x104
> > [ 2.154114] nvkm_subdev_init_+0x3c/0x12c
> > [ 2.154119] nvkm_subdev_init+0x60/0xa0
> > [ 2.154125] nvkm_device_init+0x14c/0x2a0
> > [ 2.154133] nvkm_udevice_init+0x60/0x9c
> > [ 2.154140] nvkm_object_init+0x48/0x1b0
> > [ 2.154144] nvkm_ioctl_new+0x168/0x254
> > [ 2.154149] nvkm_ioctl+0xd0/0x220
> > [ 2.154153] nvkm_client_ioctl+0x10/0x1c
> > [ 2.154162] nvif_object_ctor+0xf4/0x22c
> > [ 2.154168] nvif_device_ctor+0x28/0x70
> > [ 2.154174] nouveau_cli_init+0x150/0x590
> > [ 2.154180] nouveau_drm_device_init+0x60/0x2a0
> > [ 2.154187] nouveau_platform_device_create+0x90/0xd0
> > [ 2.154193] nouveau_platform_probe+0x3c/0x9c
> > [ 2.154200] platform_probe+0x68/0xc0
> > [ 2.154207] really_probe+0xbc/0x2dc
> > [ 2.154211] __driver_probe_device+0x78/0xe0
> > [ 2.154216] driver_probe_device+0xd8/0x160
> > [ 2.154221] __device_attach_driver+0xb8/0x134
> > [ 2.154226] bus_for_each_drv+0x78/0xd0
> > [ 2.154230] __device_attach+0x9c/0x1a0
> > [ 2.154234] device_initial_probe+0x14/0x20
> > [ 2.154239] bus_probe_device+0x98/0xa0
> > [ 2.154243] deferred_probe_work_func+0x88/0xc0
> > [ 2.154247] process_one_work+0x204/0x40c
> > [ 2.154256] worker_thread+0x230/0x450
> > [ 2.154261] kthread+0xc8/0xcc
> > [ 2.154266] ret_from_fork+0x10/0x20
> > [ 2.154273] ---[ end trace 0000000000000000 ]---
> > [ 2.154278] nouveau 57000000.gpu: pmu: map -22
> > [ 2.154285] nouveau 57000000.gpu: acr: one-time init failed, -22
> > [ 2.154559] nouveau 57000000.gpu: init failed with -22
> > [ 2.154564] nouveau: DRM-master:00000000:00000080: init failed with -22
> > [ 2.154574] nouveau 57000000.gpu: DRM-master: Device allocation failed: -22
> > [ 2.162905] nouveau: probe of 57000000.gpu failed with error -22
> >
> > #regzbot introduced: 2541626cfb79
>
> As a quick check can you try changing
>
> drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target
> from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?
>
> Dave.
I'm also reproducing the error on jetson-tx1 running 6.2-rc4 (rawhide
nodebug kernel) on Fedora 37 userspace.
With this change, the error is different:
nouveau: loading out-of-tree module taints kernel.
nouveau: module verification failed: signature and/or required key
missing - tainting kernel
Failed to set up IOMMU for device 57000000.gpu; retaining platform DMA ops
nouveau 57000000.gpu: NVIDIA GM20B (12b000a1)
nouveau 57000000.gpu: imem: using IOMMU
Unable to handle kernel execution of user memory at virtual address
0000000000000000
Mem abort info:
ESR = 0x0000000086000004
EC = 0x21: IABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
user pgtable: 4k pages, 48-bit VAs, pgdp=000000013e136000
[0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000086000004 [#1] SMP
Modules linked in: nouveau(OE+) drm_ttm_helper ttm snd_seq_dummy
snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables
nfnetlink qrtr sunrpc snd_soc_tegra_audio_graph_card
snd_soc_audio_graph_card snd_soc_simple_card_utils snd_soc_core
snd_compress snd_hda_codec_hdmi ac97_bus snd_hda_tegra
snd_pcm_dmaengine snd_hda_codec max77620_thermal vfat snd_hda_core fat
snd_hwdep snd_seq cpufreq_dt tegra_xudc snd_seq_device usb_conn_gpio
tegra_soctherm snd_pcm udc_core snd_timer snd at24 soundcore zram
r8152 mii panel_simple mmc_block tegra_drm drm_dp_aux_bus rtc_max77686
drm_display_helper lp855x_bl cec crct10dif_ce polyval_ce
polyval_generic xhci_tegra sdhci_tegra ghash_ce sdhci_pltfm
phy_tegra_xusb sdhci host1x ahci_tegra gpio_keys tegra210_emc cqhci
rtc_tegra i2c_tegra ip6_tables
abrt-dump-journal-oops: Found oopses: 1
abrt-dump-journal-oops: Creating problem directories
Can't find a meaningful backtrace for hashing in '.'
Preserving oops '.' because DropNotReportableOopses is 'no'
Reported 1 kernel oopses to Abrt
System encountered a non-fatal error in ??()
ip_tables fuse
CPU: 3 PID: 15789 Comm: insmod Tainted: G OE -------
--- 6.2.0-0.rc4.31.fc38.aarch64 #1
Hardware name: nvidia,p2371-2180 NVIDIA P2371-2180/NVIDIA P2371-2180,
BIOS 2022.10 10/01/2022
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : 0x0
lr : nvkm_falcon_load_dmem+0x60/0x80 [nouveau]
sp : ffff8000141034c0
x29: ffff8000141034c0 x28: ffff0000865d2008 x27: ffff0000c87b8638
x26: 00000000000000ff x25: ffff0000865d20a0 x24: 0000000000000000
x23: ffff0000f4ef34f8 x22: 000000000000002c x21: 0000000000005fd4
x20: ffff800014103534 x19: ffff0000f4ef34b8 x18: ffffffffffffffff
x17: 000000040044ffff x16: 00500074b5503510 x15: ffff8000141035d8
x14: ffff0000c87b8506 x13: 0000000000000001 x12: 0000000000000018
x11: 0000000000000001 x10: fffffffffffffec0 x9 : ffff8000091efa90
x8 : ffff800014103560 x7 : 0000000000000000 x6 : 0000000434da8f65
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 000000000000002c
x2 : 0000000000005fd4 x1 : ffff800014103534 x0 : ffff0000f4ef34b8
Call trace:
0x0
gm20b_pmu_init+0x70/0x90 [nouveau]
nvkm_pmu_init+0x28/0x40 [nouveau]
nvkm_subdev_init_+0x68/0x134 [nouveau]
nvkm_subdev_init+0x68/0xb0 [nouveau]
nvkm_device_init+0x154/0x2ac [nouveau]
nvkm_udevice_init+0x68/0xa0 [nouveau]
nvkm_object_init+0x50/0x1b4 [nouveau]
nvkm_ioctl_new+0x154/0x280 [nouveau]
nvkm_ioctl+0xd8/0x230 [nouveau]
nvkm_client_ioctl+0x18/0x24 [nouveau]
nvif_object_ctor+0xec/0x1a0 [nouveau]
nvif_device_ctor+0x30/0x7c [nouveau]
nouveau_cli_init+0x144/0x574 [nouveau]
nouveau_drm_device_init+0x68/0x2b0 [nouveau]
nouveau_platform_device_create+0x98/0xd0 [nouveau]
nouveau_platform_probe+0x34/0x90 [nouveau]
platform_probe+0x70/0xd0
really_probe+0xc8/0x3e4
__driver_probe_device+0x84/0x190
driver_probe_device+0x44/0x11c
__driver_attach+0xf8/0x200
bus_for_each_dev+0x6c/0xac
driver_attach+0x2c/0x40
bus_add_driver+0x188/0x250
driver_register+0x80/0x13c
__platform_driver_register+0x30/0x3c
nouveau_drm_init+0x9c/0x1000 [nouveau]
do_one_initcall+0x4c/0x2a0
do_init_module+0x50/0x200
load_module+0x9b0/0xb10
__do_sys_finit_module+0x98/0x100
__arm64_sys_finit_module+0x28/0x34
invoke_syscall+0x78/0x100
el0_svc_common.constprop.0+0x4c/0xf4
do_el0_svc+0x34/0x4c
el0_svc+0x34/0x10c
el0t_64_sync_handler+0x114/0x120
el0t_64_sync+0x194/0x198
Code: bad PC value
---[ end trace 0000000000000000 ]---
More information about the dri-devel
mailing list