[REGRESSION] GM20B probe fails after commit 2541626cfb79

Karol Herbst kherbst at redhat.com
Sat Jan 14 03:27:38 UTC 2023


On Fri, Jan 13, 2023 at 2:19 PM Linux kernel regression tracking
(Thorsten Leemhuis) <regressions at leemhuis.info> wrote:
>
> [CCing Daniel]
>
> On 05.01.23 13:28, Thorsten Leemhuis wrote:
> > [adding Karol and Lyude to the list of recipients]
> >
> > On 28.12.22 15:49, Diogo Ivo wrote:
> >> Hello,
> >>
> >> Commit 2541626cfb79 breaks GM20B probe with
> >> the following kernel log:
> > Just wondering: is anyone looking on this? The report was posted more
> > than a week ago and didn't even get a single reply yet afaics. This of
> > course can happen at this time of the year, but I nevertheless thought a
> > quick status inquiry might be a good idea at this point.
>
> Hmmm, the report is now more that two weeks old and didn't get a single
> reply. My prodding about a week ago also didn't help. Then I guess I
> have to bring this to Linus attention, unless something happens in the
> next 2 days.
>

I tried to look into it, but my jetson nano, just constantly behaves
in very strange ways. I tried to compile and install a 6.1 kernel onto
it, but any kernel just refuses to boot and I have no idea what's up
with that device. The kernel starts to boot and it just stops in the
middle. From what I can tell is that most of the tegra devices never
worked reliably in the first place and there are a couple of random
and strange bugs around. I've attached my dmesg, so if anybody has any
clues why the kernel just stops doing anything, it would really help
me.

But maybe it would be for the best to just pull tegra support out of
nouveau, because in the current situation we really can't spare much
time dealing with them and we are already busy enough just dealing
with the desktop GPUs. And the firmware we got from Nvidia is so
ancient and different from the desktop GPU ones, that without actually
having all those boards available and properly tested, we can't be
sure to not break them.

And afaik there are almost no _actual_ users, just distribution folks
wanting to claim "support" for those devices, but then ending up using
Nvidia's out of tree Tegra driver in deployments anyway.

If there are actual users using them for their daily life, I'd like to
know, because I'm aware of none.

If there are companies/entities actually caring about those devices
running _nouveau_, I'd be happy to keep supporting them, but then only
with proper kernel CI, because the current situation is just not
sustainable.

Ben, Lyude, Dave, Daniel, any thoughts on that?

> Diogo, for that it would be really helpful to known: is the issue still
> happening with latest mainline? Is it possible to revert 2541626cfb79
> easily? And if so: do things work afterwards again?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> >> [    2.153892] ------------[ cut here ]------------
> >> [    2.153897] WARNING: CPU: 1 PID: 36 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:273 gf100_vmm_valid+0x2c4/0x390
> >> [    2.153916] Modules linked in:
> >> [    2.153922] CPU: 1 PID: 36 Comm: kworker/u8:1 Not tainted 6.1.0+ #1
> >> [    2.153929] Hardware name: Google Pixel C (DT)
> >> [    2.153933] Workqueue: events_unbound deferred_probe_work_func
> >> [    2.153943] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> [    2.153950] pc : gf100_vmm_valid+0x2c4/0x390
> >> [    2.153959] lr : gf100_vmm_valid+0xb4/0x390
> >> [    2.153966] sp : ffffffc009e134b0
> >> [    2.153969] x29: ffffffc009e134b0 x28: 0000000000000000 x27: ffffffc008fd44c8
> >> [    2.153979] x26: 00000000ffffffea x25: ffffffc0087b98d0 x24: ffffff8080f89038
> >> [    2.153987] x23: ffffff8081fadc08 x22: 0000000000000000 x21: 0000000000000000
> >> [    2.153995] x20: ffffff8080f8a000 x19: ffffffc009e13678 x18: 0000000000000000
> >> [    2.154003] x17: f37a8b93418958e6 x16: ffffffc009f0d000 x15: 0000000000000000
> >> [    2.154011] x14: 0000000000000002 x13: 000000000003a020 x12: ffffffc008000000
> >> [    2.154019] x11: 0000000102913000 x10: 0000000000000000 x9 : 0000000000000000
> >> [    2.154026] x8 : ffffffc009e136d8 x7 : ffffffc008fd44c8 x6 : ffffff80803d0f00
> >> [    2.154034] x5 : 0000000000000000 x4 : ffffff8080f88c00 x3 : 0000000000000010
> >> [    2.154041] x2 : 000000000000000c x1 : 00000000ffffffea x0 : 00000000ffffffea
> >> [    2.154050] Call trace:
> >> [    2.154053]  gf100_vmm_valid+0x2c4/0x390
> >> [    2.154061]  nvkm_vmm_map_valid+0xd4/0x204
> >> [    2.154069]  nvkm_vmm_map_locked+0xa4/0x344
> >> [    2.154076]  nvkm_vmm_map+0x50/0x84
> >> [    2.154083]  nvkm_firmware_mem_map+0x84/0xc4
> >> [    2.154094]  nvkm_falcon_fw_oneinit+0xc8/0x320
> >> [    2.154101]  nvkm_acr_oneinit+0x428/0x5b0
> >> [    2.154109]  nvkm_subdev_oneinit_+0x50/0x104
> >> [    2.154114]  nvkm_subdev_init_+0x3c/0x12c
> >> [    2.154119]  nvkm_subdev_init+0x60/0xa0
> >> [    2.154125]  nvkm_device_init+0x14c/0x2a0
> >> [    2.154133]  nvkm_udevice_init+0x60/0x9c
> >> [    2.154140]  nvkm_object_init+0x48/0x1b0
> >> [    2.154144]  nvkm_ioctl_new+0x168/0x254
> >> [    2.154149]  nvkm_ioctl+0xd0/0x220
> >> [    2.154153]  nvkm_client_ioctl+0x10/0x1c
> >> [    2.154162]  nvif_object_ctor+0xf4/0x22c
> >> [    2.154168]  nvif_device_ctor+0x28/0x70
> >> [    2.154174]  nouveau_cli_init+0x150/0x590
> >> [    2.154180]  nouveau_drm_device_init+0x60/0x2a0
> >> [    2.154187]  nouveau_platform_device_create+0x90/0xd0
> >> [    2.154193]  nouveau_platform_probe+0x3c/0x9c
> >> [    2.154200]  platform_probe+0x68/0xc0
> >> [    2.154207]  really_probe+0xbc/0x2dc
> >> [    2.154211]  __driver_probe_device+0x78/0xe0
> >> [    2.154216]  driver_probe_device+0xd8/0x160
> >> [    2.154221]  __device_attach_driver+0xb8/0x134
> >> [    2.154226]  bus_for_each_drv+0x78/0xd0
> >> [    2.154230]  __device_attach+0x9c/0x1a0
> >> [    2.154234]  device_initial_probe+0x14/0x20
> >> [    2.154239]  bus_probe_device+0x98/0xa0
> >> [    2.154243]  deferred_probe_work_func+0x88/0xc0
> >> [    2.154247]  process_one_work+0x204/0x40c
> >> [    2.154256]  worker_thread+0x230/0x450
> >> [    2.154261]  kthread+0xc8/0xcc
> >> [    2.154266]  ret_from_fork+0x10/0x20
> >> [    2.154273] ---[ end trace 0000000000000000 ]---
> >> [    2.154278] nouveau 57000000.gpu: pmu: map -22
> >> [    2.154285] nouveau 57000000.gpu: acr: one-time init failed, -22
> >> [    2.154559] nouveau 57000000.gpu: init failed with -22
> >> [    2.154564] nouveau: DRM-master:00000000:00000080: init failed with -22
> >> [    2.154574] nouveau 57000000.gpu: DRM-master: Device allocation failed: -22
> >> [    2.162905] nouveau: probe of 57000000.gpu failed with error -22
> >>
> >> #regzbot introduced: 2541626cfb79
> >>
> >> Thanks,
> >>
> >> Diogo Ivo
> >>
> >>
> >
> > #regzbot poke
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg
Type: application/octet-stream
Size: 18397 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20230114/eb4f9645/attachment-0001.obj>


More information about the dri-devel mailing list