Issues with trying to boot falcons from sgt memory + Possible firmware SG_DEBUG fix?
Ben Skeggs
bskeggs at nvidia.com
Fri Apr 19 13:52:54 UTC 2024
On 19/4/24 06:27, Lyude Paul wrote:
> So - first some context here for Ben and anyone else who hasn't been
> following. A little while ago I got a Slimbook Executive 16 with a
> Nvidia RTX 4060 in it, and I've unfortunately been running into a kind
> of annoying issue. Currently this laptop only has 16 gigs of ram, and
> as it turns out - this can easily lead the system to having pretty
> heavy memory fragmentation once it starts swapping pages out.
>
> Normally this wouldn't matter, but I unfortunately discovered that when
> we're runtime suspending the GPU in Nouveau - we actually appear to
> allocate some of the memory we use for migrating using
> dma_alloc_coherent. This starts to fail on my system once memory
> fragmentation goes up like so:
>
> kworker/18:0: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 18 PID: 287012 Comm: kworker/18:0 Not tainted
> 6.8.4-200.ChopperV1.fc39.x86_64 #1
> Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
> Workqueue: pm pm_runtime_work
> Call Trace:
> <TASK>
> dump_stack_lvl+0x47/0x60
> warn_alloc+0x165/0x1e0
> ? __alloc_pages_direct_compact+0x1ad/0x2b0
> __alloc_pages_slowpath.constprop.0+0xd7d/0xde0
> __alloc_pages+0x32d/0x350
> __dma_direct_alloc_pages.isra.0+0x16a/0x2b0
> dma_direct_alloc+0x70/0x280
> nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau]
> r535_gsp_fini+0x1d4/0x350 [nouveau]
> nvkm_subdev_fini+0x67/0x150 [nouveau]
> nvkm_device_fini+0x95/0x1e0 [nouveau]
> nvkm_udevice_fini+0x53/0x70 [nouveau]
> nvkm_object_fini+0xb9/0x240 [nouveau]
> nvkm_object_fini+0x75/0x240 [nouveau]
> nouveau_do_suspend+0xf5/0x280 [nouveau]
> nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
> pci_pm_runtime_suspend+0x67/0x1e0
> ? __pfx_pci_pm_runtime_suspend+0x10/0x10
> __rpm_callback+0x41/0x170
> ? __pfx_pci_pm_runtime_suspend+0x10/0x10
> rpm_callback+0x5d/0x70
> ? __pfx_pci_pm_runtime_suspend+0x10/0x10
> rpm_suspend+0x120/0x6a0
> pm_runtime_work+0x98/0xb0
> process_one_work+0x171/0x340
> worker_thread+0x27b/0x3a0
> ? __pfx_worker_thread+0x10/0x10
> kthread+0xe5/0x120
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x31/0x50
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1b/0x30
>
> nouveau 0000:01:00.0: gsp: suspend failed, -12
> nouveau: DRM-master:00000000:00000080: suspend failed with -12
> nouveau 0000:01:00.0: can't suspend (nouveau_pmops_runtime_suspend
> [nouveau] returned -12)
>
> Keep in mind, I don't dive into memory management related stuff like
> this very often! But I'd very much like to know how to help out
> anywhere around the driver, including outside of my usual domains, so
> I've been trying to write up a patch for this. The original suggestion
> for a fix that Dave Airlie had given me was (unless I misunderstood,
> which isn't unlikely) to try to see if we could get nvkm_gsp_mem_ctor()
> to start allocating memory with vmalloc() and map that onto the GPU
> using the SG helpers instead. So - I gave a shot at writing up a patch
> for doing that:
>
> https://gitlab.freedesktop.org/lyudess/linux/-/commit/b5a41ac2bd948979815d262d8d20b4f3333f9c26
>
> As you can probably guess - the patch does not really seem to work, and
> I've been trying to figure out why. There's already a couple of issues
> I'm aware of: the most glaring one being that as Timur pointed out, a
> lot of GSP hardware expects contiguous memory allocations - but
> according to them the allocation that's specifically failing should be
> small enough that it'd be allocated in a contiguous page anyway:
>
> [ 9.429884] Lyude:r535_gsp_init:2186: (mbox1) == 0
> [ 9.429898] Lyude:r535_gsp_init:2186: (mbox0) == dbdfe000
> [ 9.491300] ------------[ cut here ]------------
> [ 9.491308] WARNING: CPU: 5 PID: 921 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1713 r535_gsp_init+0x75e/0x7c0 [nouveau]
> [ 9.491533] Modules linked in: nouveau(+) rfkill binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep wmi_bmof ppdev snd_hda_core drm_ttm_helper intel_rapl_msr snd_seq ttm snd_seq_device snd_pcm video gpu_sched snd_timer i2c_algo_bit drm_gpuvm drm_exec intel_rapl_common mxm_wmi rapl snd drm_display_helper acpi_cpufreq soundcore k10temp i2c_piix4 parport_pc wmi parport gpio_amdpt gpio_generic loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 r8169 realtek sha1_ssse3 ccp w83627hf_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
> [ 9.491670] CPU: 5 PID: 921 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #22
> [ 9.491681] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
> [ 9.491690] RIP: 0010:r535_gsp_init+0x75e/0x7c0 [nouveau]
> [ 9.491885] Code: 8b 83 10 0d 00 00 48 89 ef 41 bf e4 ff ff ff 48 8b 40 18 48 8b 80 48 0f 00 00 48 8b 40 28 e8 b9 5e 89 ee 0f 0b e9 73 f9 ff ff <0f> 0b 41 bf fb ff ff ff e9 5a f9 ff ff 41 89 ef 0f 0b e9 5c f9 ff
> [ 9.491905] RSP: 0018:ffffb271c175f748 EFLAGS: 00010246
> [ 9.491914] RAX: 0000000000000000 RBX: ffffa098e192f000 RCX: ffffa098ca2768c8
> [ 9.491922] RDX: ffffa098e191d400 RSI: ffffb271cc110080 RDI: ffffb271cc111388
> [ 9.491930] RBP: 00000000dbdfe000 R08: 0000000000000003 R09: 0000000000000000
> [ 9.491938] R10: 0000000000000000 R11: ffffa098ca276828 R12: ffffa098e192f008
> [ 9.491946] R13: 000000022b906452 R14: ffffa098e192f008 R15: 0000000000000000
> [ 9.491956] FS: 00007f4de98cc980(0000) GS:ffffa099c4a80000(0000) knlGS:0000000000000000
> [ 9.491966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9.491974] CR2: 00007f7bd8d18ea0 CR3: 0000000104e58000 CR4: 00000000003506f0
> [ 9.491989] Call Trace:
> [ 9.491996] <TASK>
> [ 9.492002] ? __warn+0x80/0x120
> [ 9.492012] ? r535_gsp_init+0x75e/0x7c0 [nouveau]
> [ 9.492200] ? report_bug+0x164/0x190
> [ 9.492211] ? handle_bug+0x3c/0x80
> [ 9.492218] ? exc_invalid_op+0x17/0x70
> [ 9.492227] ? asm_exc_invalid_op+0x1a/0x20
> [ 9.492241] ? r535_gsp_init+0x75e/0x7c0 [nouveau]
> [ 9.492429] ? r535_gsp_init+0x18e/0x7c0 [nouveau]
> [ 9.492616] ? srso_return_thunk+0x5/0x5f
> [ 9.492626] nvkm_subdev_init_+0x48/0x130 [nouveau]
> [ 9.492802] ? srso_return_thunk+0x5/0x5f
> [ 9.492810] nvkm_subdev_init+0x44/0x90 [nouveau]
> [ 9.492988] nvkm_device_init+0x166/0x2e0 [nouveau]
> [ 9.493189] nvkm_udevice_init+0x47/0x70 [nouveau]
> [ 9.493391] nvkm_object_init+0x41/0x1c0 [nouveau]
> [ 9.493567] nvkm_ioctl_new+0x16a/0x290 [nouveau]
> [ 9.493740] ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
> [ 9.493912] ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
> [ 9.494121] nvkm_ioctl+0x10e/0x250 [nouveau]
> [ 9.494288] nvif_object_ctor+0x112/0x190 [nouveau]
> [ 9.494456] nvif_device_ctor+0x23/0x60 [nouveau]
> [ 9.494625] nouveau_cli_init+0x164/0x5d0 [nouveau]
> [ 9.494820] nouveau_drm_device_init+0x97/0xe00 [nouveau]
> [ 9.495022] ? srso_return_thunk+0x5/0x5f
> [ 9.495030] ? pci_bus_read_config_word+0x4d/0x90
> [ 9.495039] ? srso_return_thunk+0x5/0x5f
> [ 9.495047] ? pci_update_current_state+0x72/0xb0
> [ 9.495059] nouveau_drm_probe+0x12c/0x280 [nouveau]
> [ 9.495245] ? srso_return_thunk+0x5/0x5f
> [ 9.495254] local_pci_probe+0x45/0xa0
> [ 9.495263] pci_device_probe+0xc7/0x240
> [ 9.495272] really_probe+0xd6/0x390
> [ 9.495282] ? __pfx___driver_attach+0x10/0x10
> [ 9.495290] __driver_probe_device+0x78/0x150
> [ 9.495301] driver_probe_device+0x1f/0x90
> [ 9.495308] __driver_attach+0xd2/0x1c0
> [ 9.495316] bus_for_each_dev+0x88/0xd0
> [ 9.495325] bus_add_driver+0x116/0x220
> [ 9.495334] driver_register+0x59/0x100
> [ 9.495342] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
> [ 9.495512] do_one_initcall+0x5b/0x320
> [ 9.495524] do_init_module+0x60/0x240
> [ 9.495536] init_module_from_file+0x86/0xc0
> [ 9.495550] idempotent_init_module+0x120/0x2b0
> [ 9.495562] __x64_sys_finit_module+0x5e/0xb0
> [ 9.495571] do_syscall_64+0x88/0x170
> [ 9.495581] ? srso_return_thunk+0x5/0x5f
> [ 9.495589] ? syscall_exit_to_user_mode_prepare+0x15d/0x190
> [ 9.495600] ? srso_return_thunk+0x5/0x5f
> [ 9.495607] ? syscall_exit_to_user_mode+0x60/0x210
> [ 9.495615] ? srso_return_thunk+0x5/0x5f
> [ 9.495622] ? do_syscall_64+0x95/0x170
> [ 9.495630] ? srso_return_thunk+0x5/0x5f
> [ 9.495636] ? do_syscall_64+0x95/0x170
> [ 9.495644] ? srso_return_thunk+0x5/0x5f
> [ 9.495653] entry_SYSCALL_64_after_hwframe+0x71/0x79
> [ 9.495663] RIP: 0033:0x7f4de9b2919d
> [ 9.495680] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b cc 0c 00 f7 d8 64 89 01 48
> [ 9.495697] RSP: 002b:00007ffc56bfe468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [ 9.495707] RAX: ffffffffffffffda RBX: 00005644a0432350 RCX: 00007f4de9b2919d
> [ 9.495717] RDX: 0000000000000000 RSI: 00005644a042ef30 RDI: 0000000000000031
> [ 9.495726] RBP: 00007ffc56bfe520 R08: 00007f4de9bf6b20 R09: 00007ffc56bfe4b0
> [ 9.495734] R10: 00005644a04346a0 R11: 0000000000000246 R12: 00005644a042ef30
> [ 9.495742] R13: 0000000000020000 R14: 00005644a0432d10 R15: 00005644a0434660
> [ 9.495754] </TASK>
> [ 9.495759] ---[ end trace 0000000000000000 ]---
> [ 9.495778] ------------[ cut here ]------------
> [ 9.495784] WARNING: CPU: 5 PID: 921 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2187 r535_gsp_init+0xc5/0x7c0 [nouveau]
> [ 9.495981] Modules linked in: nouveau(+) rfkill binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep wmi_bmof ppdev snd_hda_core drm_ttm_helper intel_rapl_msr snd_seq ttm snd_seq_device snd_pcm video gpu_sched snd_timer i2c_algo_bit drm_gpuvm drm_exec intel_rapl_common mxm_wmi rapl snd drm_display_helper acpi_cpufreq soundcore k10temp i2c_piix4 parport_pc wmi parport gpio_amdpt gpio_generic loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 r8169 realtek sha1_ssse3 ccp w83627hf_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
> [ 9.496112] CPU: 5 PID: 921 Comm: (udev-worker) Tainted: G W 6.9.0-rc3Lyude-Test+ #22
> [ 9.496123] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
> [ 9.496132] RIP: 0010:r535_gsp_init+0xc5/0x7c0 [nouveau]
> [ 9.496317] Code: 24 18 4c 8d 63 08 89 6c 24 14 4c 89 e6 6a 00 4c 8d 44 24 20 48 8d 4c 24 1c e8 b7 c3 fa ff 5f 41 89 c7 85 c0 0f 84 97 00 00 00 <0f> 0b 48 83 bb 20 0a 00 00 00 75 37 48 8b 44 24 20 65 48 2b 04 25
> [ 9.496333] RSP: 0018:ffffb271c175f748 EFLAGS: 00010246
> [ 9.496341] RAX: 0000000000000000 RBX: ffffa098e192f000 RCX: ffffa098ca2768c8
> [ 9.496351] RDX: ffffa098e191d400 RSI: ffffb271cc110080 RDI: ffffb271cc111388
> [ 9.496360] RBP: 00000000dbdfe000 R08: 0000000000000003 R09: 0000000000000000
> [ 9.496368] R10: 0000000000000000 R11: ffffa098ca276828 R12: ffffa098e192f008
> [ 9.496375] R13: 000000022b906452 R14: ffffa098e192f008 R15: 00000000fffffffb
> [ 9.496383] FS: 00007f4de98cc980(0000) GS:ffffa099c4a80000(0000) knlGS:0000000000000000
> [ 9.496393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9.496400] CR2: 00007f7bd8d18ea0 CR3: 0000000104e58000 CR4: 00000000003506f0
> [ 9.496410] Call Trace:
> [ 9.496416] <TASK>
> [ 9.496422] ? __warn+0x80/0x120
> [ 9.496429] ? r535_gsp_init+0xc5/0x7c0 [nouveau]
> [ 9.496622] ? report_bug+0x164/0x190
> [ 9.496631] ? handle_bug+0x3c/0x80
> [ 9.496638] ? exc_invalid_op+0x17/0x70
> [ 9.496647] ? asm_exc_invalid_op+0x1a/0x20
> [ 9.496660] ? r535_gsp_init+0xc5/0x7c0 [nouveau]
> [ 9.496851] ? r535_gsp_init+0x18e/0x7c0 [nouveau]
> [ 9.497044] ? srso_return_thunk+0x5/0x5f
> [ 9.497055] nvkm_subdev_init_+0x48/0x130 [nouveau]
> [ 9.497227] ? srso_return_thunk+0x5/0x5f
> [ 9.497236] nvkm_subdev_init+0x44/0x90 [nouveau]
> [ 9.497405] nvkm_device_init+0x166/0x2e0 [nouveau]
> [ 9.497608] nvkm_udevice_init+0x47/0x70 [nouveau]
> [ 9.497808] nvkm_object_init+0x41/0x1c0 [nouveau]
> [ 9.497983] nvkm_ioctl_new+0x16a/0x290 [nouveau]
> [ 9.498154] ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
> [ 9.498326] ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
> [ 9.498531] nvkm_ioctl+0x10e/0x250 [nouveau]
> [ 9.498702] nvif_object_ctor+0x112/0x190 [nouveau]
> [ 9.498873] nvif_device_ctor+0x23/0x60 [nouveau]
> [ 9.499049] nouveau_cli_init+0x164/0x5d0 [nouveau]
> [ 9.499244] nouveau_drm_device_init+0x97/0xe00 [nouveau]
> [ 9.499430] ? srso_return_thunk+0x5/0x5f
> [ 9.499437] ? pci_bus_read_config_word+0x4d/0x90
> [ 9.499445] ? srso_return_thunk+0x5/0x5f
> [ 9.499452] ? pci_update_current_state+0x72/0xb0
> [ 9.499461] nouveau_drm_probe+0x12c/0x280 [nouveau]
> [ 9.499657] ? srso_return_thunk+0x5/0x5f
> [ 9.499666] local_pci_probe+0x45/0xa0
> [ 9.499674] pci_device_probe+0xc7/0x240
> [ 9.499683] really_probe+0xd6/0x390
> [ 9.499692] ? __pfx___driver_attach+0x10/0x10
> [ 9.499699] __driver_probe_device+0x78/0x150
> [ 9.499709] driver_probe_device+0x1f/0x90
> [ 9.499718] __driver_attach+0xd2/0x1c0
> [ 9.499726] bus_for_each_dev+0x88/0xd0
> [ 9.499735] bus_add_driver+0x116/0x220
> [ 9.499744] driver_register+0x59/0x100
> [ 9.499751] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
> [ 9.499915] do_one_initcall+0x5b/0x320
> [ 9.499926] do_init_module+0x60/0x240
> [ 9.499934] init_module_from_file+0x86/0xc0
> [ 9.499948] idempotent_init_module+0x120/0x2b0
> [ 9.499962] __x64_sys_finit_module+0x5e/0xb0
> [ 9.499971] do_syscall_64+0x88/0x170
> [ 9.499987] ? srso_return_thunk+0x5/0x5f
> [ 9.499996] ? syscall_exit_to_user_mode_prepare+0x15d/0x190
> [ 9.500004] ? srso_return_thunk+0x5/0x5f
> [ 9.500011] ? syscall_exit_to_user_mode+0x60/0x210
> [ 9.500019] ? srso_return_thunk+0x5/0x5f
> [ 9.500026] ? do_syscall_64+0x95/0x170
> [ 9.500034] ? srso_return_thunk+0x5/0x5f
> [ 9.500041] ? do_syscall_64+0x95/0x170
> [ 9.500050] ? srso_return_thunk+0x5/0x5f
> [ 9.500058] entry_SYSCALL_64_after_hwframe+0x71/0x79
> [ 9.500067] RIP: 0033:0x7f4de9b2919d
> [ 9.500075] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b cc 0c 00 f7 d8 64 89 01 48
> [ 9.500091] RSP: 002b:00007ffc56bfe468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [ 9.500103] RAX: ffffffffffffffda RBX: 00005644a0432350 RCX: 00007f4de9b2919d
> [ 9.500112] RDX: 0000000000000000 RSI: 00005644a042ef30 RDI: 0000000000000031
> [ 9.500121] RBP: 00007ffc56bfe520 R08: 00007f4de9bf6b20 R09: 00007ffc56bfe4b0
> [ 9.500128] R10: 00005644a04346a0 R11: 0000000000000246 R12: 00005644a042ef30
> [ 9.500136] R13: 0000000000020000 R14: 00005644a0432d10 R15: 00005644a0434660
> [ 9.500149] </TASK>
> [ 9.500154] ---[ end trace 0000000000000000 ]---
> [ 9.500162] nouveau 0000:1f:00.0: gsp: init failed, -5
> [ 9.500189] nouveau 0000:1f:00.0: init failed with -5
> [ 9.500196] nouveau: DRM-master:00000000:00000080: init failed with -5
> [ 9.500207] nouveau 0000:1f:00.0: DRM-master: Device allocation failed: -5
> [ 9.502661] nouveau 0000:1f:00.0: probe with driver nouveau failed with error -5
>
>
> Which brings me to the second part - TImur had me enable CONFIG_SG_DEBUG, which quickly hit a different issue:
>
> [ 8.992320] RIP: 0010:sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
> [ 8.992331] Code: 71 93 37 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54 24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b 94 7d 00 <0f> 0b 0f 0b 0f 0b 48 8b 05 5e ae 9f 01 eb b2 66 66 2e 0f 1f 84 00
> [ 8.992428] Call Trace:
> [ 8.992433] <TASK>
> [ 8.992439] ? die (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:421 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:434 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:447)
> [ 8.992448] ? do_trap (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:114 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:155)
> [ 8.992455] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
> [ 8.992464] ? do_error_trap (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./arch/x86/include/asm/traps.h:58 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:176)
> [ 8.992472] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
> [ 8.992481] ? exc_invalid_op (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:267)
> [ 8.992489] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
> [ 8.992496] ? asm_exc_invalid_op (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./arch/x86/include/asm/idtentry.h:621)
> [ 8.992509] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
> [ 8.992518] nvkm_firmware_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/firmware.c:249) nouveau
> [ 8.992722] nvkm_falcon_fw_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c:199) nouveau
> [ 8.992898] ga102_gsp_booter_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/ga102.c:62) nouveau
> [ 8.993095] r535_gsp_oneinit (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2309) nouveau
> [ 8.993292] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
> [ 8.993302] ? kmem_cache_alloc_lru (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3748 (discriminator 2) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3827 (discriminator 2) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3864 (discriminator 2))
> [ 8.993311] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
> [ 8.993317] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
> [ 8.993324] ? ktime_get (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:292 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:388 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:848)
> [ 8.993334] nvkm_subdev_oneinit_ (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:113) nouveau
> [ 8.993510] nvkm_subdev_init_ (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:139) nouveau
> [ 8.993685] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
> [ 8.993693] nvkm_subdev_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:170) nouveau
> [ 8.993867] nvkm_device_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c:3023) nouveau
> [ 8.994079] nvkm_udevice_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/user.c:295) nouveau
> [ 8.994281] nvkm_object_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/object.c:245) nouveau
> [ 8.994457] nvkm_ioctl_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:149) nouveau
> [ 8.994630] ? __pfx_nvkm_client_child_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/client.c:125) nouveau
> [ 8.994803] ? __pfx_nvkm_udevice_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/user.c:386) nouveau
> [ 8.995013] nvkm_ioctl (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:354 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:376) nouveau
> [ 8.995187] nvif_object_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/object.c:298 (discriminator 1)) nouveau
> [ 8.995356] nvif_device_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/device.c:56) nouveau
> [ 8.995524] nouveau_cli_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:270) nouveau
> [ 8.995721] nouveau_drm_device_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:602) nouveau
> [ 8.995915] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
> [ 8.995923] ? pci_bus_read_config_word (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/access.c:67 (discriminator 1))
> [ 8.995932] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
> [ 8.995939] ? pci_update_current_state (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci.c:1195 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci.c:1187)
> [ 8.995949] nouveau_drm_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:841) nouveau
> [ 8.996145] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
> [ 8.996154] local_pci_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:325)
> [ 8.996163] pci_device_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:392 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:417 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:451 (discriminator 1))
> [ 8.996174] really_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:578 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:656)
> [ 8.996185] ? __pfx___driver_attach (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:1155)
> [ 8.996192] __driver_probe_device (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:798)
> [ 8.996201] driver_probe_device (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:828)
> [ 8.996209] __driver_attach (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:1215)
> [ 8.996217] bus_for_each_dev (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/bus.c:368)
> [ 8.996228] bus_add_driver (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/bus.c:673)
> [ 8.996238] driver_register (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/driver.c:246)
> [ 8.996246] ? __pfx_nouveau_drm_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/object.c:32) nouveau
> [ 8.996415] do_one_initcall (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/init/main.c:1238)
> [ 8.996428] do_init_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:2538)
> [ 8.996437] init_module_from_file (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3168)
> [ 8.996450] idempotent_init_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/spinlock.h:351 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3131 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3185)
> [ 8.996462] __x64_sys_finit_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/file.h:47 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3207 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3189 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3189)
> [ 8.996473] do_syscall_64 (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/common.c:52 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/common.c:83 (discriminator 1))
> [ 8.996482] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
> [ 8.996490] entry_SYSCALL_64_after_hwframe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/entry_64.S:129)
> [ 8.996499] RIP: 0033:0x7fd12f52919d
>
> I think timur actually mentioned this bug to you previously, but in
> hopes of getting something more useful out of SG_DEBUG I dug into this
> problem a bit and ended up with what I believe is an actually correct
> patch:
>
> https://gitlab.freedesktop.org/lyudess/linux/-/commit/485f1fb62ddd4b42b60848eeb48206fef4376161
I think this patch is fine, and does solve the issue for me here if I
enable SG_DEBUG.
Ben.
>
> ...unfortunately, fixing that issue on my system did not get SG_DEBUG
> to give me any useful info.
>
> Anyway - that brings me to ask 1: do you have any idea what might be
> going on with the falcon boot issue I mentioned, or if I might just be
> doing something wrong/silly with how I'm setting up memory in
> nvkm_gsp_mem_ctor()?
>
> And 2: if you have the time does that patch look correct? I'm happy to
> submit it :)
>
> Also 3: welcome back again :)
>
More information about the Nouveau
mailing list