BUG [RESEND][NEW BUG]: kernel NULL pointer dereference, address: 0000000000000008

Mirsad Todorovac mirsad.todorovac at alu.unizg.hr
Thu Jan 25 09:29:21 UTC 2024


Hi Ma Jun,

Copy that. This appears to be the exact problem, and thank you for
reviewing the bug report at such a short notice.

I apologise for the wrong assertion.

The patch you sent then just triggered another bug, and it is not 
manifested without the patch (but a NULL pointer dereference instead).

But of course, it is not profitable to remove your patch and have
the NULL ptr dereference, but a proper fix is required.

Thanks again.

Best regards,
Mirsad Todorovac

On 1/25/2024 8:38 AM, Ma, Jun wrote:
> Hi Mirsad,
> 
> 
> On 1/25/2024 1:48 AM, Mirsad Todorovac wrote:
>> Hi, Ma Jun,
>>
>> Normally, I would reply under the quoted text, but I will adjust to your convention.
>>
>> I have just discovered that your patch causes Ubuntu 22.04 LTS GNOME XWayland session
>> to block at typing password and ENTER in the graphical logon screen (tested several times).
>>
> This problem is not caused by my patch.
> Based on your syslog, it looks more like a shedule issue.
> I just saw a similar problem, please refer to the link below
> https://gitlab.freedesktop.org/drm/amd/-/issues/3124
> 
> Regards,
> Ma Jun
>> After that, I was not able to even log from another box with ssh, or the session would
>> block (tested one time, second time too, thrid time it passed after I connected before
>> attempt to login on XWayland console).
>>
>> You might find useful syslog and dmesg of the freeze on this link (they were +100K):
>>
>> https://magrf.grf.hr/~mtodorov/linux/bugreports/6.7.0/amdgpu/6.7.0-xway-09721-g61da593f4458/
>>
>> The exact applied patch was this:
>>
>> marvin at defiant:~/linux/kernel/linux_torvalds$ git diff
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 73f6d7e72c73..6ef333df9adf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev)
>>     
>>            if (!amdgpu_sriov_vf(adev)) {
>>                    snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
>> -               err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, fw_name);
>> -               /* don't check this.  There are apparently firmwares in the wild with
>> -                * incorrect size in the header
>> -                */
>> -               if (err == -ENODEV)
>> -                       goto out;
>> +               err = request_firmware(&adev->gfx.rlc_fw, fw_name, adev->dev);
>>                    if (err)
>> -                       dev_dbg(adev->dev,
>> -                               "gfx10: amdgpu_ucode_request() failed \"%s\"\n",
>> -                               fw_name);
>> +                       goto out;
>> +
>> +               /* don't validate this firmware.  There are apparently firmwares
>> +                * in the wild with incorrect size in the header
>> +                */
>>                    rlc_hdr = (const struct rlc_firmware_header_v2_0 *)adev->gfx.rlc_fw->data;
>>                    version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
>>                    version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
>> marvin at defiant:~/linux/kernel/linux_torvalds$ uname -rms
>> Linux 6.7.0-xway-09721-g61da593f4458 x86_64
>> marvin at defiant:~/linux/kernel/linux_torvalds$
>>
>> So, there seems to be a problem with the way the patch affects XWayland.
>>
>> Checked multiple times the exact commit with and without the diff.
>>
>> Hope this helps, because I am not familiar with the amdgpu driver.
>>
>> Best regards,
>> Mirsad Todorovac
>>
>> On 1/22/24 09:34, Ma, Jun wrote:
>>> Perhaps similar to the problem I encountered earlier, you can
>>> try the following patch
>>>
>>> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html
>>>
>>> Regards,
>>> Ma Jun
>>>
>>> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
>>>> Hi,
>>>>
>>>> The last email did not pass to the most of the recipients due to banned .xz attachment.
>>>>
>>>> As the .config is too big to send inline or uncompressed either, I will omit it in this
>>>> attempt. In the meantime, I had some success in decoding the stack trace, but sadly not
>>>> complete.
>>>>
>>>> I don't think this Oops is deterministic, but I am working on a reproducer.
>>>>
>>>> The platform is Ubuntu 22.04 LTS.
>>>>
>>>> Complete list of hardware and .config is available here:
>>>>
>>>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/
>>>>
>>>> Best regards,
>>>> Mirsad
>>>>
>>>> -------------------------------------------------------------------------------------------
>>>> kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>> kernel: [    5.576707] #PF: supervisor read access in kernel mode
>>>> kernel: [    5.576710] #PF: error_code(0x0000) - not-present page
>>>> kernel: [    5.576712] PGD 0 P4D 0
>>>> kernel: [    5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>> kernel: [    5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>>> kernel: [    5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>>> kernel: [    5.576726] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>> All code
>>>> ========
>>>>       0:	8d 55 a8             	lea    -0x58(%rbp),%edx
>>>>       3:	4c 89 ff             	mov    %r15,%rdi
>>>>       6:	e8 e4 83 ec ff       	call   0xffffffffffec83ef
>>>>       b:	41 89 c2             	mov    %eax,%r10d
>>>>       e:	83 f8 ed             	cmp    $0xffffffed,%eax
>>>>      11:	0f 84 b3 fd ff ff    	je     0xfffffffffffffdca
>>>>      17:	85 c0                	test   %eax,%eax
>>>>      19:	74 05                	je     0x20
>>>>      1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>>>>      20:	49 8b 87 08 87 01 00 	mov    0x18708(%r15),%rax
>>>>      27:	4c 89 ff             	mov    %r15,%rdi
>>>>      2a:*	48 8b 40 08          	mov    0x8(%rax),%rax		<-- trapping instruction
>>>>      2e:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>>>      32:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>>>      36:	e8 e4 42 fb ff       	call   0xfffffffffffb431f
>>>>      3b:	41 89 c2             	mov    %eax,%r10d
>>>>      3e:	85 c0                	test   %eax,%eax
>>>>
>>>> Code starting with the faulting instruction
>>>> ===========================================
>>>>       0:	48 8b 40 08          	mov    0x8(%rax),%rax
>>>>       4:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>>>       8:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>>>       c:	e8 e4 42 fb ff       	call   0xfffffffffffb42f5
>>>>      11:	41 89 c2             	mov    %eax,%r10d
>>>>      14:	85 c0                	test   %eax,%eax
>>>> kernel: [    5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>> kernel: [    5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>> kernel: [    5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>> kernel: [    5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>> kernel: [    5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>> kernel: [    5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>> kernel: [    5.576895] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>> kernel: [    5.576898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> kernel: [    5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>> kernel: [    5.576903] PKRU: 55555554
>>>> kernel: [    5.576905] Call Trace:
>>>> kernel: [    5.576907]  <TASK>
>>>> kernel: [    5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479)
>>>> kernel: [    5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>>>> kernel: [    5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707)
>>>> kernel: [    5.576921] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.576925] ? crypto_alloc_tfmmem.isra.0 (crypto/api.c:497)
>>>> kernel: [    5.576930] ? do_user_addr_fault (arch/x86/mm/fault.c:1264)
>>>> kernel: [    5.576934] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1515 arch/x86/mm/fault.c:1563)
>>>> kernel: [    5.576937] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>>>> kernel: [    5.576942] ? gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>> kernel: [    5.577056] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2465 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4042) amdgpu
>>>> kernel: [    5.577158] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577161] ? pci_bus_read_config_word (drivers/pci/access.c:67 (discriminator 2))
>>>> kernel: [    5.577166] ? pci_read_config_word (drivers/pci/access.c:563)
>>>> kernel: [    5.577168] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577171] ? do_pci_enable_device (drivers/pci/pci.c:1975 drivers/pci/pci.c:1949)
>>>> kernel: [    5.577176] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
>>>> kernel: [    5.577275] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2237) amdgpu
>>>> kernel: [    5.577373] local_pci_probe (drivers/pci/pci-driver.c:324)
>>>> kernel: [    5.577377] pci_device_probe (drivers/pci/pci-driver.c:392 drivers/pci/pci-driver.c:417 drivers/pci/pci-driver.c:460)
>>>> kernel: [    5.577381] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
>>>> kernel: [    5.577386] __driver_probe_device (drivers/base/dd.c:800)
>>>> kernel: [    5.577389] driver_probe_device (drivers/base/dd.c:830)
>>>> kernel: [    5.577392] __driver_attach (drivers/base/dd.c:1217)
>>>> kernel: [    5.577396] ? __pfx___driver_attach (drivers/base/dd.c:1157)
>>>> kernel: [    5.577399] bus_for_each_dev (drivers/base/bus.c:368)
>>>> kernel: [    5.577402] driver_attach (drivers/base/dd.c:1234)
>>>> kernel: [    5.577405] bus_add_driver (drivers/base/bus.c:674)
>>>> kernel: [    5.577409] driver_register (drivers/base/driver.c:246)
>>>> kernel: [    5.577411] ? __pfx_amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2497) amdgpu
>>>> kernel: [    5.577521] __pci_register_driver (drivers/pci/pci-driver.c:1456)
>>>> kernel: [    5.577524] amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drvc:2805) amdgpu
>>>> kernel: [    5.577628] do_one_initcall (init/main.c:1236)
>>>> kernel: [    5.577632] ? kmalloc_trace (mm/slub.c:3816 mm/slub.c:3860 mm/slub.c:4007)
>>>> kernel: [    5.577637] do_init_module (kernel/module/main.c:2533)
>>>> kernel: [    5.577640] load_module (kernel/module/main.c:2984)
>>>> kernel: [    5.577647] init_module_from_file (kernel/module/main.c:3151)
>>>> kernel: [    5.577649] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577652] ? init_module_from_file (kernel/module/main.c:3151)
>>>> kernel: [    5.577657] idempotent_init_module (kernel/module/main.c:3168)
>>>> kernel: [    5.577661] __x64_sys_finit_module (./include/linux/file.h:45 kernel/module/main.c:3190 kernel/module/main.c:3172 kernel/module/main.c:3172)
>>>> kernel: [    5.577664] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>>>> kernel: [    5.577668] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577671] ? ksys_mmap_pgoff (mm/mmap.c:1428)
>>>> kernel: [    5.577675] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577678] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577681] ? syscall_exit_to_user_mode (kernel/entry/commonc:215)
>>>> kernel: [    5.577684] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577687] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>> kernel: [    5.577689] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577692] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>> kernel: [    5.577695] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577698] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>> kernel: [    5.577700] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [    5.577703] ? sysvec_call_function (arch/x86/kernel/smp.c:253 (discriminator 69))
>>>> kernel: [    5.577707] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
>>>> kernel: [    5.577709] RIP: 0033:0x7fdaa331e88d
>>>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>>> All code
>>>> ========
>>>>       0:	5b                   	pop    %rbx
>>>>       1:	41 5c                	pop    %r12
>>>>       3:	c3                   	ret
>>>>       4:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
>>>>       b:	00 00
>>>>       d:	f3 0f 1e fa          	endbr64
>>>>      11:	48 89 f8             	mov    %rdi,%rax
>>>>      14:	48 89 f7             	mov    %rsi,%rdi
>>>>      17:	48 89 d6             	mov    %rdx,%rsi
>>>>      1a:	48 89 ca             	mov    %rcx,%rdx
>>>>      1d:	4d 89 c2             	mov    %r8,%r10
>>>>      20:	4d 89 c8             	mov    %r9,%r8
>>>>      23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
>>>>      28:	0f 05                	syscall
>>>>      2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>>>>      30:	73 01                	jae    0x33
>>>>      32:	c3                   	ret
>>>>      33:	48 8b 0d 73 b5 0f 00 	mov    0xfb573(%rip),%rcx        # 0xfb5ad
>>>>      3a:	f7 d8                	neg    %eax
>>>>      3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>>>>      3f:	48                   	rex.W
>>>>
>>>> Code starting with the faulting instruction
>>>> ===========================================
>>>>       0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>>>>       6:	73 01                	jae    0x9
>>>>       8:	c3                   	ret
>>>>       9:	48 8b 0d 73 b5 0f 00 	mov    0xfb573(%rip),%rcx        # 0xfb583
>>>>      10:	f7 d8                	neg    %eax
>>>>      12:	64 89 01             	mov    %eax,%fs:(%rcx)
>>>>      15:	48                   	rex.W
>>>> kernel: [    5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>> kernel: [    5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>>> kernel: [    5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>>> kernel: [    5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>>> kernel: [    5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>>> kernel: [    5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>>> kernel: [    5.577748]  </TASK>
>>>> kernel: [    5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>>> kernel: [    5.577817] CR2: 0000000000000008
>>>> kernel: [    5.577820] ---[ end trace 0000000000000000 ]---
>>>> kernel: [    5.914230] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>> All code
>>>> ========
>>>>       0:	8d 55 a8             	lea    -0x58(%rbp),%edx
>>>>       3:	4c 89 ff             	mov    %r15,%rdi
>>>>       6:	e8 e4 83 ec ff       	call   0xffffffffffec83ef
>>>>       b:	41 89 c2             	mov    %eax,%r10d
>>>>       e:	83 f8 ed             	cmp    $0xffffffed,%eax
>>>>      11:	0f 84 b3 fd ff ff    	je     0xfffffffffffffdca
>>>>      17:	85 c0                	test   %eax,%eax
>>>>      19:	74 05                	je     0x20
>>>>      1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>>>>      20:	49 8b 87 08 87 01 00 	mov    0x18708(%r15),%rax
>>>>      27:	4c 89 ff             	mov    %r15,%rdi
>>>>      2a:*	48 8b 40 08          	mov    0x8(%rax),%rax		<-- trapping instruction
>>>>      2e:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>>>      32:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>>>      36:	e8 e4 42 fb ff       	call   0xfffffffffffb431f
>>>>      3b:	41 89 c2             	mov    %eax,%r10d
>>>>      3e:	85 c0                	test   %eax,%eax
>>>>
>>>> Code starting with the faulting instruction
>>>> ===========================================
>>>>       0:	48 8b 40 08          	mov    0x8(%rax),%rax
>>>>       4:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>>>       8:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>>>       c:	e8 e4 42 fb ff       	call   0xfffffffffffb42f5
>>>>      11:	41 89 c2             	mov    %eax,%r10d
>>>>      14:	85 c0                	test   %eax,%eax
>>>> rsyslogd: rsyslogd's groupid changed to 111
>>>> kernel: [    5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>> kernel: [    5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>> kernel: [    5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>> kernel: [    5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>> kernel: [    5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>> kernel: [    5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>> kernel: [    5.914410] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>> kernel: [    5.914414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> kernel: [    5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>> kernel: [    5.914419] PKRU: 55555554
>>>>
>>>> Best regards,
>>>> Mirsad
>>>>
>>>> On 1/18/24 18:23, Mirsad Todorovac wrote:
>>>>> Hi,
>>>>>
>>>>> Unfortunately, I was not able to reboot in this kernel again to do the stack decode, but I thought
>>>>> that any information about the NULL pointer dereference is better than no info.
>>>>>
>>>>> The system is Ubuntu 23.10 Mantic with AMD product: Navi 23 [Radeon RX 6600/6600 XT/6600M]
>>>>> graphic card.
>>>>>
>>>>> Please find the config and the hw listing attached.
>>>>>
>>>>> Best regards,
>>>>> Mirsad
>>>>
>>>>
>>>>
>>>>> kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>>> kernel: [    5.576707] #PF: supervisor read access in kernel mode
>>>>> kernel: [    5.576710] #PF: error_code(0x0000) - not-present page
>>>>> kernel: [    5.576712] PGD 0 P4D 0
>>>>> kernel: [    5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>>> kernel: [    5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>>>> kernel: [    5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>>>> kernel: [    5.576726] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>> kernel: [    5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>>> kernel: [    5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>>> kernel: [    5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>>> kernel: [    5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>>> kernel: [    5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>>> kernel: [    5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>>> kernel: [    5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>>> kernel: [    5.576895] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>>> kernel: [    5.576898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> kernel: [    5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>>> kernel: [    5.576903] PKRU: 55555554
>>>>> kernel: [    5.576905] Call Trace:
>>>>> kernel: [    5.576907]  <TASK>
>>>>> kernel: [    5.576909]  ? show_regs+0x72/0x90
>>>>> kernel: [    5.576914]  ? __die+0x25/0x80
>>>>> kernel: [    5.576917]  ? page_fault_oops+0x154/0x4c0
>>>>> kernel: [    5.576921]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.576925]  ? crypto_alloc_tfmmem.isra0+0x35/0x70
>>>>> kernel: [    5.576930]  ? do_user_addr_fault+0x30e/0x6e0
>>>>> kernel: [    5.576934]  ? exc_page_fault+0x84/0x1b0
>>>>> kernel: [    5.576937]  ? asm_exc_page_fault+0x27/0x30
>>>>> kernel: [    5.576942]  ? gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>> kernel: [    5.577056]  amdgpu_device_init+0xefa/0x2de0 [amdgpu]
>>>>> kernel: [    5.577158]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577161]  ? pci_bus_read_config_word+0x47/0x90
>>>>> kernel: [    5.577166]  ? pci_read_config_word+0x27/0x60
>>>>> kernel: [    5.577168]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577171]  ? do_pci_enable_device+0xe1/0x110
>>>>> kernel: [    5.577176]  amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
>>>>> kernel: [    5.577275]  amdgpu_pci_probe+0x1a8/0x5e0 [amdgpu]
>>>>> kernel: [    5.577373]  local_pci_probe+0x48/0xb0
>>>>> kernel: [    5.577377]  pci_device_probe+0xc8/0x290
>>>>> kernel: [    5.577381]  really_probe+0x1d2/0x440
>>>>> kernel: [    5.577386]  __driver_probe_device+0x8a/0x190
>>>>> kernel: [    5.577389]  driver_probe_device+0x23/0xd0
>>>>> kernel: [    5.577392]  __driver_attach+0x10f/0x220
>>>>> kernel: [    5.577396]  ? __pfx___driver_attach+0x10/0x10
>>>>> kernel: [    5.577399]  bus_for_each_dev+0x7a/0xe0
>>>>> kernel: [    5.577402]  driver_attach+0x1e/0x30
>>>>> kernel: [    5.577405]  bus_add_driver+0x127/0x240
>>>>> kernel: [    5.577409]  driver_register+0x64/0x140
>>>>> kernel: [    5.577411]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>>>>> kernel: [    5.577521]  __pci_register_driver+0x68/0x80
>>>>> kernel: [    5.577524]  amdgpu_init+0x69/0xff0 [amdgpu]
>>>>> kernel: [    5.577628]  do_one_initcall+0x46/0x330
>>>>> kernel: [    5.577632]  ? kmalloc_trace+0x136/0x370
>>>>> kernel: [    5.577637]  do_init_module+0x6a/0x280
>>>>> kernel: [    5.577640]  load_module+0x2419/0x2500
>>>>> kernel: [    5.577647]  init_module_from_file+0x9c/0xf0
>>>>> kernel: [    5.577649]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577652]  ? init_module_from_file+0x9c/0xf0
>>>>> kernel: [    5.577657]  idempotent_init_module+0x184/0x240
>>>>> kernel: [    5.577661]  __x64_sys_finit_module+0x64/0xd0
>>>>> kernel: [    5.577664]  do_syscall_64+0x76/0x140
>>>>> kernel: [    5.577668]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577671]  ? ksys_mmap_pgoff+0x123/0x270
>>>>> kernel: [    5.577675]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577678]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577681]  ? syscall_exit_to_user_mode+0x97/0x1e0
>>>>> kernel: [    5.577684]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577687]  ? do_syscall_64+0x85/0x140
>>>>> kernel: [    5.577689]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577692]  ? do_syscall_64+0x85/0x140
>>>>> kernel: [    5.577695]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577698]  ? do_syscall_64+0x85/0x140
>>>>> kernel: [    5.577700]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [    5.577703]  ? sysvec_call_function+0x4e/0xb0
>>>>> kernel: [    5.577707]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>>>> kernel: [    5.577709] RIP: 0033:0x7fdaa331e88d
>>>>> kernel: [    5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>>>> kernel: [    5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>>> kernel: [    5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>>>> kernel: [    5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>>>> kernel: [    5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>>>> kernel: [    5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>>>> kernel: [    5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>>>> kernel: [    5.577748]  </TASK>
>>>>> kernel: [    5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>>>> kernel: [    5.577817] CR2: 0000000000000008
>>>>> kernel: [    5.577820] ---[ end trace 0000000000000000 ]---
>>>>> kernel: [    5.914230] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>> kernel: [    5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>>> rsyslogd: rsyslogd's groupid changed to 111
>>>>> kernel: [    5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>>> kernel: [    5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>>> kernel: [    5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>>> kernel: [    5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>>> kernel: [    5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>>> kernel: [    5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>>> kernel: [    5.914410] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>>> kernel: [    5.914414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> kernel: [    5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>>> kernel: [    5.914419] PKRU: 55555554
> 


More information about the amd-gfx mailing list