[REGRESSION] GM20B pmu timeout

Linux regression tracking (Thorsten Leemhuis) regressions at leemhuis.info
Tue Oct 15 18:13:00 UTC 2024


Hi, Thorsten here, the Linux kernel's regression tracker.

On 10.10.24 15:32, Diogo Ivo wrote:
> 
> Somewhere between 6.11-rc4 and 6.11-rc5 the following error message is displayed
> when trying to initialize a nvc0_screen on the Tegra X1's GM20B:
> 
> [ 34.431210] nouveau 57000000.gpu: pmu:hpq: timeout waiting for queue ready
> [ 34.438145] nouveau 57000000.gpu: gr: init failed, -110
> nvc0_screen_create:1075 - Error allocating PGRAPH context for M2MF: -110
> failed to create GPU screen

Thx for the report. Hmmm. No reply so far. :-/

Diogo, maybe report this here as well:
https://gitlab.freedesktop.org/drm/nouveau/-/issues/

Afterwards drop a link to the ticket here. Reporting nouveau issues via
email should work, but maybe you have more luck there.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

> If we then try a second time we get a more detailed error message:
> 
> [   27.432391] ------------[ cut here ]------------
> [   27.437019] nouveau 57000000.gpu: timeout
> [   27.441083] WARNING: CPU: 2 PID: 307 at drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:840 gf100_gr_fecs_bind_pointer+0x140/0x158 [nouveau]
> [   27.453897] Modules linked in: nouveau drm_ttm_helper ttm backlight gpu_sched i2c_algo_bit drm_gpuvm drm_exec efivarfs
> [   27.464592] CPU: 2 UID: 0 PID: 307 Comm: loadjpeg Not tainted 6.11.0-rc4+ #1
> [   27.471628] Hardware name: nvidia NVIDIA P2371-2180/NVIDIA P2371-2180, BIOS 2024.10-rc5-00018-g56b47b8b6a09 10/01/2024
> [   27.482303] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   27.489251] pc : gf100_gr_fecs_bind_pointer+0x140/0x158 [nouveau]
> [   27.495535] lr : gf100_gr_fecs_bind_pointer+0x140/0x158 [nouveau]
> [   27.501794] sp : ffffffc082473810
> [   27.505100] x29: ffffffc082473840 x28: ffffff80c56fe500 x27: ffffff80c6f3be40
> [   27.512227] x26: 00000000804001ea x25: 0000000000000001 x24: 0000000000000000
> [   27.519351] x23: ffffff80c5516808 x22: ffffffc079d08350 x21: ffffff80c16bae40
> [   27.526476] x20: 0000000000409800 x19: ffffff80c5516808 x18: ffffffffffffffff
> [   27.533599] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000006
> [   27.540724] x14: ffffffc0817defc8 x13: 74756f656d697420 x12: 3a7570672e303030
> [   27.547848] x11: ffffffc0817defc8 x10: 00000000000003f1 x9 : ffffffc081836fc8
> [   27.554972] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : 0000000000000001
> [   27.562096] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
> [   27.569218] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80d578c600
> [   27.576341] Call trace:
> [   27.578780]  gf100_gr_fecs_bind_pointer+0x140/0x158 [nouveau]
> [   27.584698]  gf100_grctx_generate+0x54c/0x6f4 [nouveau]
> [   27.590093]  gf100_gr_chan_new+0x3f8/0x430 [nouveau]
> [   27.595223]  nvkm_gr_cclass_new+0x34/0x48 [nouveau]
> [   27.600269]  nvkm_cgrp_ectx_get+0x134/0x224 [nouveau]
> [   27.605485]  nvkm_cgrp_vctx_get+0x11c/0x300 [nouveau]
> [   27.610704]  nvkm_chan_cctx_get+0x144/0x25c [nouveau]
> [   27.615920]  nvkm_uchan_object_new+0xd8/0x1e0 [nouveau]
> [   27.621311]  nvkm_ioctl_new+0x14c/0x24c [nouveau]
> [   27.626167]  nvkm_ioctl+0xd0/0x280 [nouveau]
> [   27.630590]  nvkm_client_ioctl+0x10/0x1c [nouveau]
> [   27.635551]  nvif_client_ioctl+0x20/0x2c [nouveau]
> [   27.640493]  usif_ioctl+0x294/0x420 [nouveau]
> [   27.645021]  nouveau_drm_ioctl+0xb0/0xe0 [nouveau]
> [   27.649982]  __arm64_sys_ioctl+0xac/0xf0
> [   27.653900]  invoke_syscall+0x48/0x104
> [   27.657645]  el0_svc_common.constprop.0+0x40/0xe0
> [   27.662341]  do_el0_svc+0x1c/0x28
> [   27.665650]  el0_svc+0x3c/0x108
> [   27.668787]  el0t_64_sync_handler+0x120/0x12c
> [   27.673133]  el0t_64_sync+0x190/0x194
> [   27.676789] ---[ end trace 0000000000000000 ]---
> [   27.681937] nouveau 57000000.gpu: gr: failed to construct context
> [   27.688126] nouveau 57000000.gpu: fifo:000000:0002:[loadjpeg[307]] ectx 0[gr]: -110
> [   27.695786] nouveau 57000000.gpu: fifo:000000:0002:0002:[loadjpeg[307]] vctx 0[gr]: -110
> nvc0_screen_create:1075 - Error allocating PGRAPH context for M2MF: -110
> failed to create GPU screen
> 
> but I am not sure if this is connected to the fact that the first attempt
> failed or not.
> 
> When trying to bissect the issue the "bad" commit I obtained was 9b340aeb26d5.
> However, checking out this commit and compiling the kernel leads to a different
> error where we have a boot regression:
> 
> [   19.146693] nouveau 57000000.gpu: Adding to iommu group 3
> [   19.155581] nouveau 57000000.gpu: NVIDIA GM20B (12b000a1)
> [   19.161025] nouveau 57000000.gpu: imem: using IOMMU
> [   22.451833] ------------[ cut here ]------------
> [   22.456460] nouveau 57000000.gpu: timeout
> [   22.460508] WARNING: CPU: 0 PID: 201 at drivers/gpu/drm/nouveau/nvkm/falcon/gm200.c:231 gm200_flcn_fw_boot+0x2a4/0x428 [nouveau]
> [   22.472384] Modules linked in: nouveau(+) drm_ttm_helper ttm backlight gpu_sched i2c_algo_bit drm_gpuvm drm_exec efivarfs
> [   22.483342] CPU: 0 UID: 0 PID: 201 Comm: (udev-worker) Not tainted 6.11.0-rc1+ #4
> [   22.490811] Hardware name: nvidia NVIDIA P2371-2180/NVIDIA P2371-2180, BIOS 2024.10-rc5-00018-g56b47b8b6a09 10/01/2024
> [   22.501485] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   22.508434] pc : gm200_flcn_fw_boot+0x2a4/0x428 [nouveau]
> [   22.514063] lr : gm200_flcn_fw_boot+0x2a4/0x428 [nouveau]
> [   22.519656] sp : ffffffc0822fb3e0
> [   22.522961] x29: ffffffc0822fb410 x28: ffffff80c7bf0008 x27: ffffff80d5625208
> [   22.530088] x26: 0000000000000001 x25: 0000000000000010 x24: 0000000000000000
> [   22.537213] x23: ffffff80c4e920b8 x22: 0000000000000000 x21: 0000000000000000
> [   22.544336] x20: 0000000000000010 x19: ffffff80c4e920b8 x18: ffffffffffffffff
> [   22.551460] x17: 000000000000d000 x16: 0000000000000000 x15: 0000000000000006
> [   22.558585] x14: ffffffc08181efa8 x13: 74756f656d697420 x12: 3a7570672e303030
> [   22.565709] x11: ffffffc08181efa8 x10: 00000000000003fd x9 : ffffffc081876fa8
> [   22.572834] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : 0000000000000001
> [   22.579958] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
> [   22.587083] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80c2f98000
> [   22.594208] Call trace:
> [   22.596647]  gm200_flcn_fw_boot+0x2a4/0x428 [nouveau]
> [   22.601904]  nvkm_falcon_fw_boot+0x1b4/0x598 [nouveau]
> [   22.607237]  nvkm_acr_hsfw_boot+0x78/0xa4 [nouveau]
> [   22.612309]  gm200_acr_init+0x18/0x24 [nouveau]
> [   22.617034]  nvkm_acr_load+0x7c/0x18c [nouveau]
> [   22.621760]  nvkm_acr_init+0x54/0x70 [nouveau]
> [   22.626400]  nvkm_subdev_init_+0x5c/0x12c [nouveau]
> [   22.631471]  nvkm_subdev_init+0x60/0xa0 [nouveau]
> [   22.636370]  nvkm_device_init+0x160/0x2a0 [nouveau]
> [   22.641448]  nvkm_udevice_init+0x60/0xa0 [nouveau]
> [   22.646439]  nvkm_object_init+0x48/0x1c0 [nouveau]
> [   22.651426]  nvkm_ioctl_new+0x164/0x24c [nouveau]
> [   22.656323]  nvkm_ioctl+0xd0/0x280 [nouveau]
> [   22.660787]  nvkm_client_ioctl+0x10/0x1c [nouveau]
> [   22.665784]  nvif_object_ctor+0xe8/0x1b8 [nouveau]
> [   22.670769]  nvif_device_ctor+0x28/0x78 [nouveau]
> [   22.675667]  nouveau_cli_init+0x154/0x5e0 [nouveau]
> [   22.680749]  nouveau_drm_device_init+0x84/0x2e0 [nouveau]
> [   22.686352]  nouveau_platform_device_create+0x90/0xe0 [nouveau]
> [   22.692476]  nouveau_platform_probe+0x40/0xc0 [nouveau]
> [   22.697904]  platform_probe+0x68/0xd8
> [   22.701564]  really_probe+0xbc/0x2c0
> [   22.705133]  __driver_probe_device+0x78/0x120
> [   22.709480]  driver_probe_device+0x3c/0x160
> [   22.713654]  __driver_attach+0x90/0x1a0
> [   22.717481]  bus_for_each_dev+0x78/0xd8
> [   22.721309]  driver_attach+0x24/0x30
> [   22.724875]  bus_add_driver+0xe4/0x208
> [   22.728615]  driver_register+0x68/0x124
> [   22.732443]  __platform_driver_register+0x28/0x40
> [   22.737137]  nouveau_drm_init+0x90/0x1000 [nouveau]
> [   22.742217]  do_one_initcall+0x44/0x230
> [   22.746047]  do_init_module+0x5c/0x220
> [   22.749788]  load_module+0x748/0x87c
> [   22.753355]  init_module_from_file+0x88/0xcc
> [   22.757617]  __arm64_sys_finit_module+0x164/0x328
> [   22.762310]  invoke_syscall+0x48/0x104
> [   22.766054]  el0_svc_common+0xc8/0xe8
> [   22.769710]  do_el0_svc+0x20/0x34
> [   22.773017]  el0_svc+0x3c/0x108
> [   22.776155]  el0t_64_sync_handler+0x120/0x12c
> [   22.780502]  el0t_64_sync+0x190/0x194
> [   22.784156] ---[ end trace 0000000000000000 ]---
> [   22.788838] nouveau 57000000.gpu: pmu(acr): mbox 00000001 00000000
> [   22.795033] nouveau 57000000.gpu: pmu(acr):load: boot failed: -110
> [   22.801235] nouveau 57000000.gpu: acr: init failed, -110
> [   22.806858] nouveau 57000000.gpu: init failed with -110
> [   22.812084] nouveau: DRM-master:00000000:00000080: init failed with -110
> [   22.818793] nouveau 57000000.gpu: DRM-master: Device allocation failed: -110
> [   22.826368] ------------[ cut here ]------------
> [   22.830980] WARNING: CPU: 2 PID: 201 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c:239 nvkm_mmu_dtor+0xac/0xc0 [nouveau]
> [   22.842573] Modules linked in: nouveau(+) drm_ttm_helper ttm backlight gpu_sched i2c_algo_bit drm_gpuvm drm_exec efivarfs
> [   22.853529] CPU: 2 UID: 0 PID: 201 Comm: (udev-worker) Tainted: G        W          6.11.0-rc1+ #4
> [   22.862475] Tainted: [W]=WARN
> [   22.865433] Hardware name: nvidia NVIDIA P2371-2180/NVIDIA P2371-2180, BIOS 2024.10-rc5-00018-g56b47b8b6a09 10/01/2024
> [   22.876107] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   22.883055] pc : nvkm_mmu_dtor+0xac/0xc0 [nouveau]
> [   22.888063] lr : nvkm_mmu_dtor+0x24/0xc0 [nouveau]
> [   22.893057] sp : ffffffc0822fb7f0
> [   22.896362] x29: ffffffc0822fb7f0 x28: 0000000000000000 x27: ffffffc079c69a18
> [   22.903488] x26: ffffffc079c69d38 x25: ffffffc081892ce8 x24: ffffff80d5624e00
> [   22.910613] x23: ffffff80d5624e08 x22: dead000000000122 x21: dead000000000100
> [   22.917737] x20: ffffff80d5624f50 x19: ffffff80c4e07500 x18: ffffffffffffffff
> [   22.924861] x17: 0000000000001000 x16: 0000000000000000 x15: 0000000000000000
> [   22.931985] x14: 0000000000000000 x13: dead000000000122 x12: 0000000000000001
> [   22.939109] x11: 0000000080000000 x10: 0000000000000000 x9 : 0000000000000001
> [   22.946233] x8 : 00000000000007e0 x7 : 0000000000000000 x6 : 0000000000000239
> [   22.953357] x5 : 000000000010000c x4 : dead000000000122 x3 : ffffff80c2fa5b38
> [   22.960481] x2 : ffffff80d519a320 x1 : ffffff80d519a2d0 x0 : ffffff80d519a2c0
> [   22.967604] Call trace:
> [   22.970042]  nvkm_mmu_dtor+0xac/0xc0 [nouveau]
> [   22.974690]  nvkm_subdev_del+0x6c/0xf8 [nouveau]
> [   22.979504]  nvkm_device_del+0x78/0x120 [nouveau]
> [   22.984410]  nouveau_platform_device_create+0x54/0xe0 [nouveau]
> [   22.990534]  nouveau_platform_probe+0x40/0xc0 [nouveau]
> [   22.995966]  platform_probe+0x68/0xd8
> [   22.999624]  really_probe+0xbc/0x2c0
> [   23.003192]  __driver_probe_device+0x78/0x120
> [   23.007540]  driver_probe_device+0x3c/0x160
> [   23.011714]  __driver_attach+0x90/0x1a0
> [   23.015542]  bus_for_each_dev+0x78/0xd8
> [   23.019369]  driver_attach+0x24/0x30
> [   23.022937]  bus_add_driver+0xe4/0x208
> [   23.026676]  driver_register+0x68/0x124
> [   23.030503]  __platform_driver_register+0x28/0x40
> [   23.035197]  nouveau_drm_init+0x90/0x1000 [nouveau]
> [   23.040274]  do_one_initcall+0x44/0x230
> [   23.044103]  do_init_module+0x5c/0x220
> [   23.047844]  load_module+0x748/0x87c
> [   23.051412]  init_module_from_file+0x88/0xcc
> [   23.055672]  __arm64_sys_finit_module+0x164/0x328
> [   23.060367]  invoke_syscall+0x48/0x104
> [   23.064110]  el0_svc_common+0xc8/0xe8
> [   23.067765]  do_el0_svc+0x20/0x34
> [   23.071073]  el0_svc+0x3c/0x108
> [   23.074206]  el0t_64_sync_handler+0x120/0x12c
> [   23.078553]  el0t_64_sync+0x190/0x194
> [   23.082206] ---[ end trace 0000000000000000 ]---
> [   23.087065] nouveau 57000000.gpu: imem: instobj LRU not empty!
> [   23.092906] nouveau 57000000.gpu: imem: instobj vmap area not empty! 0x40000 bytes still mapped
> [   23.101958] nvkm: mm not clean!
> [   23.105095] nvkm: node list:
> [   23.107994] nvkm: 	00000000 00000074 0
> [   23.111750] nvkm: 	00400074 00000040 1
> [   23.115496] nvkm: 	000000b4 003fff4c 0
> [   23.119248] nvkm: free list:
> [   23.122128] nvkm: 	00000000 00000074 0
> [   23.125880] nvkm: 	000000b4 003fff4c 0
> [   23.129643] nouveau 57000000.gpu: probe with driver nouveau failed with error -110
> 
> so I am not sure that this is the actual commit that introduces the breakage.
> I have also tried to manually checkout some commits to see where the problem
> could be but unfortunately nothing came out of it.
> 
> Best regards,
> Diogo
> 
> #regzbot introduced: v6.11-rc4..v6.11-rc5 
> 
> 
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke


More information about the dri-devel mailing list