BUG [RESEND][NEW BUG]: kernel NULL pointer dereference, address: 0000000000000008

Ma, Jun majun at amd.com
Thu Jan 25 07:38:10 UTC 2024


Hi Mirsad,


On 1/25/2024 1:48 AM, Mirsad Todorovac wrote:
> Hi, Ma Jun,
> 
> Normally, I would reply under the quoted text, but I will adjust to your convention.
> 
> I have just discovered that your patch causes Ubuntu 22.04 LTS GNOME XWayland session
> to block at typing password and ENTER in the graphical logon screen (tested several times).
> 
This problem is not caused by my patch. 
Based on your syslog, it looks more like a shedule issue.
I just saw a similar problem, please refer to the link below
https://gitlab.freedesktop.org/drm/amd/-/issues/3124

Regards,
Ma Jun
> After that, I was not able to even log from another box with ssh, or the session would
> block (tested one time, second time too, thrid time it passed after I connected before
> attempt to login on XWayland console).
> 
> You might find useful syslog and dmesg of the freeze on this link (they were +100K):
> 
> https://magrf.grf.hr/~mtodorov/linux/bugreports/6.7.0/amdgpu/6.7.0-xway-09721-g61da593f4458/
> 
> The exact applied patch was this:
> 
> marvin at defiant:~/linux/kernel/linux_torvalds$ git diff
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 73f6d7e72c73..6ef333df9adf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev)
>    
>           if (!amdgpu_sriov_vf(adev)) {
>                   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
> -               err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, fw_name);
> -               /* don't check this.  There are apparently firmwares in the wild with
> -                * incorrect size in the header
> -                */
> -               if (err == -ENODEV)
> -                       goto out;
> +               err = request_firmware(&adev->gfx.rlc_fw, fw_name, adev->dev);
>                   if (err)
> -                       dev_dbg(adev->dev,
> -                               "gfx10: amdgpu_ucode_request() failed \"%s\"\n",
> -                               fw_name);
> +                       goto out;
> +
> +               /* don't validate this firmware.  There are apparently firmwares
> +                * in the wild with incorrect size in the header
> +                */
>                   rlc_hdr = (const struct rlc_firmware_header_v2_0 *)adev->gfx.rlc_fw->data;
>                   version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
>                   version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
> marvin at defiant:~/linux/kernel/linux_torvalds$ uname -rms
> Linux 6.7.0-xway-09721-g61da593f4458 x86_64
> marvin at defiant:~/linux/kernel/linux_torvalds$
> 
> So, there seems to be a problem with the way the patch affects XWayland.
> 
> Checked multiple times the exact commit with and without the diff.
> 
> Hope this helps, because I am not familiar with the amdgpu driver.
> 
> Best regards,
> Mirsad Todorovac
> 
> On 1/22/24 09:34, Ma, Jun wrote:
>> Perhaps similar to the problem I encountered earlier, you can
>> try the following patch
>>
>> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html
>>
>> Regards,
>> Ma Jun
>>
>> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
>>> Hi,
>>>
>>> The last email did not pass to the most of the recipients due to banned .xz attachment.
>>>
>>> As the .config is too big to send inline or uncompressed either, I will omit it in this
>>> attempt. In the meantime, I had some success in decoding the stack trace, but sadly not
>>> complete.
>>>
>>> I don't think this Oops is deterministic, but I am working on a reproducer.
>>>
>>> The platform is Ubuntu 22.04 LTS.
>>>
>>> Complete list of hardware and .config is available here:
>>>
>>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/
>>>
>>> Best regards,
>>> Mirsad
>>>
>>> -------------------------------------------------------------------------------------------
>>> kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> kernel: [    5.576707] #PF: supervisor read access in kernel mode
>>> kernel: [    5.576710] #PF: error_code(0x0000) - not-present page
>>> kernel: [    5.576712] PGD 0 P4D 0
>>> kernel: [    5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>> kernel: [    5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>> kernel: [    5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>> kernel: [    5.576726] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>> All code
>>> ========
>>>      0:	8d 55 a8             	lea    -0x58(%rbp),%edx
>>>      3:	4c 89 ff             	mov    %r15,%rdi
>>>      6:	e8 e4 83 ec ff       	call   0xffffffffffec83ef
>>>      b:	41 89 c2             	mov    %eax,%r10d
>>>      e:	83 f8 ed             	cmp    $0xffffffed,%eax
>>>     11:	0f 84 b3 fd ff ff    	je     0xfffffffffffffdca
>>>     17:	85 c0                	test   %eax,%eax
>>>     19:	74 05                	je     0x20
>>>     1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>>>     20:	49 8b 87 08 87 01 00 	mov    0x18708(%r15),%rax
>>>     27:	4c 89 ff             	mov    %r15,%rdi
>>>     2a:*	48 8b 40 08          	mov    0x8(%rax),%rax		<-- trapping instruction
>>>     2e:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>>     32:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>>     36:	e8 e4 42 fb ff       	call   0xfffffffffffb431f
>>>     3b:	41 89 c2             	mov    %eax,%r10d
>>>     3e:	85 c0                	test   %eax,%eax
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>>      0:	48 8b 40 08          	mov    0x8(%rax),%rax
>>>      4:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>>      8:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>>      c:	e8 e4 42 fb ff       	call   0xfffffffffffb42f5
>>>     11:	41 89 c2             	mov    %eax,%r10d
>>>     14:	85 c0                	test   %eax,%eax
>>> kernel: [    5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>> kernel: [    5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>> kernel: [    5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>> kernel: [    5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>> kernel: [    5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>> kernel: [    5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>> kernel: [    5.576895] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>> kernel: [    5.576898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: [    5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>> kernel: [    5.576903] PKRU: 55555554
>>> kernel: [    5.576905] Call Trace:
>>> kernel: [    5.576907]  <TASK>
>>> kernel: [    5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479)
>>> kernel: [    5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>>> kernel: [    5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707)
>>> kernel: [    5.576921] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.576925] ? crypto_alloc_tfmmem.isra.0 (crypto/api.c:497)
>>> kernel: [    5.576930] ? do_user_addr_fault (arch/x86/mm/fault.c:1264)
>>> kernel: [    5.576934] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1515 arch/x86/mm/fault.c:1563)
>>> kernel: [    5.576937] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>>> kernel: [    5.576942] ? gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>> kernel: [    5.577056] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2465 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4042) amdgpu
>>> kernel: [    5.577158] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577161] ? pci_bus_read_config_word (drivers/pci/access.c:67 (discriminator 2))
>>> kernel: [    5.577166] ? pci_read_config_word (drivers/pci/access.c:563)
>>> kernel: [    5.577168] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577171] ? do_pci_enable_device (drivers/pci/pci.c:1975 drivers/pci/pci.c:1949)
>>> kernel: [    5.577176] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
>>> kernel: [    5.577275] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2237) amdgpu
>>> kernel: [    5.577373] local_pci_probe (drivers/pci/pci-driver.c:324)
>>> kernel: [    5.577377] pci_device_probe (drivers/pci/pci-driver.c:392 drivers/pci/pci-driver.c:417 drivers/pci/pci-driver.c:460)
>>> kernel: [    5.577381] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
>>> kernel: [    5.577386] __driver_probe_device (drivers/base/dd.c:800)
>>> kernel: [    5.577389] driver_probe_device (drivers/base/dd.c:830)
>>> kernel: [    5.577392] __driver_attach (drivers/base/dd.c:1217)
>>> kernel: [    5.577396] ? __pfx___driver_attach (drivers/base/dd.c:1157)
>>> kernel: [    5.577399] bus_for_each_dev (drivers/base/bus.c:368)
>>> kernel: [    5.577402] driver_attach (drivers/base/dd.c:1234)
>>> kernel: [    5.577405] bus_add_driver (drivers/base/bus.c:674)
>>> kernel: [    5.577409] driver_register (drivers/base/driver.c:246)
>>> kernel: [    5.577411] ? __pfx_amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2497) amdgpu
>>> kernel: [    5.577521] __pci_register_driver (drivers/pci/pci-driver.c:1456)
>>> kernel: [    5.577524] amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drvc:2805) amdgpu
>>> kernel: [    5.577628] do_one_initcall (init/main.c:1236)
>>> kernel: [    5.577632] ? kmalloc_trace (mm/slub.c:3816 mm/slub.c:3860 mm/slub.c:4007)
>>> kernel: [    5.577637] do_init_module (kernel/module/main.c:2533)
>>> kernel: [    5.577640] load_module (kernel/module/main.c:2984)
>>> kernel: [    5.577647] init_module_from_file (kernel/module/main.c:3151)
>>> kernel: [    5.577649] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577652] ? init_module_from_file (kernel/module/main.c:3151)
>>> kernel: [    5.577657] idempotent_init_module (kernel/module/main.c:3168)
>>> kernel: [    5.577661] __x64_sys_finit_module (./include/linux/file.h:45 kernel/module/main.c:3190 kernel/module/main.c:3172 kernel/module/main.c:3172)
>>> kernel: [    5.577664] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>>> kernel: [    5.577668] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577671] ? ksys_mmap_pgoff (mm/mmap.c:1428)
>>> kernel: [    5.577675] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577678] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577681] ? syscall_exit_to_user_mode (kernel/entry/common.c:215)
>>> kernel: [    5.577684] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577687] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>> kernel: [    5.577689] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577692] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>> kernel: [    5.577695] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577698] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>> kernel: [    5.577700] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [    5.577703] ? sysvec_call_function (arch/x86/kernel/smp.c:253 (discriminator 69))
>>> kernel: [    5.577707] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
>>> kernel: [    5.577709] RIP: 0033:0x7fdaa331e88d
>>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>> All code
>>> ========
>>>      0:	5b                   	pop    %rbx
>>>      1:	41 5c                	pop    %r12
>>>      3:	c3                   	ret
>>>      4:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
>>>      b:	00 00
>>>      d:	f3 0f 1e fa          	endbr64
>>>     11:	48 89 f8             	mov    %rdi,%rax
>>>     14:	48 89 f7             	mov    %rsi,%rdi
>>>     17:	48 89 d6             	mov    %rdx,%rsi
>>>     1a:	48 89 ca             	mov    %rcx,%rdx
>>>     1d:	4d 89 c2             	mov    %r8,%r10
>>>     20:	4d 89 c8             	mov    %r9,%r8
>>>     23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
>>>     28:	0f 05                	syscall
>>>     2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>>>     30:	73 01                	jae    0x33
>>>     32:	c3                   	ret
>>>     33:	48 8b 0d 73 b5 0f 00 	mov    0xfb573(%rip),%rcx        # 0xfb5ad
>>>     3a:	f7 d8                	neg    %eax
>>>     3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>>>     3f:	48                   	rex.W
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>>      0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>>>      6:	73 01                	jae    0x9
>>>      8:	c3                   	ret
>>>      9:	48 8b 0d 73 b5 0f 00 	mov    0xfb573(%rip),%rcx        # 0xfb583
>>>     10:	f7 d8                	neg    %eax
>>>     12:	64 89 01             	mov    %eax,%fs:(%rcx)
>>>     15:	48                   	rex.W
>>> kernel: [    5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>> kernel: [    5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>> kernel: [    5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>> kernel: [    5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>> kernel: [    5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>> kernel: [    5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>> kernel: [    5.577748]  </TASK>
>>> kernel: [    5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>> kernel: [    5.577817] CR2: 0000000000000008
>>> kernel: [    5.577820] ---[ end trace 0000000000000000 ]---
>>> kernel: [    5.914230] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>> All code
>>> ========
>>>      0:	8d 55 a8             	lea    -0x58(%rbp),%edx
>>>      3:	4c 89 ff             	mov    %r15,%rdi
>>>      6:	e8 e4 83 ec ff       	call   0xffffffffffec83ef
>>>      b:	41 89 c2             	mov    %eax,%r10d
>>>      e:	83 f8 ed             	cmp    $0xffffffed,%eax
>>>     11:	0f 84 b3 fd ff ff    	je     0xfffffffffffffdca
>>>     17:	85 c0                	test   %eax,%eax
>>>     19:	74 05                	je     0x20
>>>     1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>>>     20:	49 8b 87 08 87 01 00 	mov    0x18708(%r15),%rax
>>>     27:	4c 89 ff             	mov    %r15,%rdi
>>>     2a:*	48 8b 40 08          	mov    0x8(%rax),%rax		<-- trapping instruction
>>>     2e:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>>     32:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>>     36:	e8 e4 42 fb ff       	call   0xfffffffffffb431f
>>>     3b:	41 89 c2             	mov    %eax,%r10d
>>>     3e:	85 c0                	test   %eax,%eax
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>>      0:	48 8b 40 08          	mov    0x8(%rax),%rax
>>>      4:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>>      8:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>>      c:	e8 e4 42 fb ff       	call   0xfffffffffffb42f5
>>>     11:	41 89 c2             	mov    %eax,%r10d
>>>     14:	85 c0                	test   %eax,%eax
>>> rsyslogd: rsyslogd's groupid changed to 111
>>> kernel: [    5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>> kernel: [    5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>> kernel: [    5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>> kernel: [    5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>> kernel: [    5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>> kernel: [    5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>> kernel: [    5.914410] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>> kernel: [    5.914414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: [    5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>> kernel: [    5.914419] PKRU: 55555554
>>>
>>> Best regards,
>>> Mirsad
>>>
>>> On 1/18/24 18:23, Mirsad Todorovac wrote:
>>>> Hi,
>>>>
>>>> Unfortunately, I was not able to reboot in this kernel again to do the stack decode, but I thought
>>>> that any information about the NULL pointer dereference is better than no info.
>>>>
>>>> The system is Ubuntu 23.10 Mantic with AMD product: Navi 23 [Radeon RX 6600/6600 XT/6600M]
>>>> graphic card.
>>>>
>>>> Please find the config and the hw listing attached.
>>>>
>>>> Best regards,
>>>> Mirsad
>>>
>>>
>>>
>>>> kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>> kernel: [    5.576707] #PF: supervisor read access in kernel mode
>>>> kernel: [    5.576710] #PF: error_code(0x0000) - not-present page
>>>> kernel: [    5.576712] PGD 0 P4D 0
>>>> kernel: [    5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>> kernel: [    5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>>> kernel: [    5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>>> kernel: [    5.576726] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>> kernel: [    5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>> kernel: [    5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>> kernel: [    5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>> kernel: [    5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>> kernel: [    5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>> kernel: [    5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>> kernel: [    5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>> kernel: [    5.576895] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>> kernel: [    5.576898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> kernel: [    5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>> kernel: [    5.576903] PKRU: 55555554
>>>> kernel: [    5.576905] Call Trace:
>>>> kernel: [    5.576907]  <TASK>
>>>> kernel: [    5.576909]  ? show_regs+0x72/0x90
>>>> kernel: [    5.576914]  ? __die+0x25/0x80
>>>> kernel: [    5.576917]  ? page_fault_oops+0x154/0x4c0
>>>> kernel: [    5.576921]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.576925]  ? crypto_alloc_tfmmem.isra.0+0x35/0x70
>>>> kernel: [    5.576930]  ? do_user_addr_fault+0x30e/0x6e0
>>>> kernel: [    5.576934]  ? exc_page_fault+0x84/0x1b0
>>>> kernel: [    5.576937]  ? asm_exc_page_fault+0x27/0x30
>>>> kernel: [    5.576942]  ? gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>> kernel: [    5.577056]  amdgpu_device_init+0xefa/0x2de0 [amdgpu]
>>>> kernel: [    5.577158]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577161]  ? pci_bus_read_config_word+0x47/0x90
>>>> kernel: [    5.577166]  ? pci_read_config_word+0x27/0x60
>>>> kernel: [    5.577168]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577171]  ? do_pci_enable_device+0xe1/0x110
>>>> kernel: [    5.577176]  amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
>>>> kernel: [    5.577275]  amdgpu_pci_probe+0x1a8/0x5e0 [amdgpu]
>>>> kernel: [    5.577373]  local_pci_probe+0x48/0xb0
>>>> kernel: [    5.577377]  pci_device_probe+0xc8/0x290
>>>> kernel: [    5.577381]  really_probe+0x1d2/0x440
>>>> kernel: [    5.577386]  __driver_probe_device+0x8a/0x190
>>>> kernel: [    5.577389]  driver_probe_device+0x23/0xd0
>>>> kernel: [    5.577392]  __driver_attach+0x10f/0x220
>>>> kernel: [    5.577396]  ? __pfx___driver_attach+0x10/0x10
>>>> kernel: [    5.577399]  bus_for_each_dev+0x7a/0xe0
>>>> kernel: [    5.577402]  driver_attach+0x1e/0x30
>>>> kernel: [    5.577405]  bus_add_driver+0x127/0x240
>>>> kernel: [    5.577409]  driver_register+0x64/0x140
>>>> kernel: [    5.577411]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>>>> kernel: [    5.577521]  __pci_register_driver+0x68/0x80
>>>> kernel: [    5.577524]  amdgpu_init+0x69/0xff0 [amdgpu]
>>>> kernel: [    5.577628]  do_one_initcall+0x46/0x330
>>>> kernel: [    5.577632]  ? kmalloc_trace+0x136/0x370
>>>> kernel: [    5.577637]  do_init_module+0x6a/0x280
>>>> kernel: [    5.577640]  load_module+0x2419/0x2500
>>>> kernel: [    5.577647]  init_module_from_file+0x9c/0xf0
>>>> kernel: [    5.577649]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577652]  ? init_module_from_file+0x9c/0xf0
>>>> kernel: [    5.577657]  idempotent_init_module+0x184/0x240
>>>> kernel: [    5.577661]  __x64_sys_finit_module+0x64/0xd0
>>>> kernel: [    5.577664]  do_syscall_64+0x76/0x140
>>>> kernel: [    5.577668]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577671]  ? ksys_mmap_pgoff+0x123/0x270
>>>> kernel: [    5.577675]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577678]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577681]  ? syscall_exit_to_user_mode+0x97/0x1e0
>>>> kernel: [    5.577684]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577687]  ? do_syscall_64+0x85/0x140
>>>> kernel: [    5.577689]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577692]  ? do_syscall_64+0x85/0x140
>>>> kernel: [    5.577695]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577698]  ? do_syscall_64+0x85/0x140
>>>> kernel: [    5.577700]  ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [    5.577703]  ? sysvec_call_function+0x4e/0xb0
>>>> kernel: [    5.577707]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>>> kernel: [    5.577709] RIP: 0033:0x7fdaa331e88d
>>>> kernel: [    5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>>> kernel: [    5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>> kernel: [    5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>>> kernel: [    5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>>> kernel: [    5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>>> kernel: [    5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>>> kernel: [    5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>>> kernel: [    5.577748]  </TASK>
>>>> kernel: [    5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>>> kernel: [    5.577817] CR2: 0000000000000008
>>>> kernel: [    5.577820] ---[ end trace 0000000000000000 ]---
>>>> kernel: [    5.914230] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>> kernel: [    5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>> rsyslogd: rsyslogd's groupid changed to 111
>>>> kernel: [    5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>> kernel: [    5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>> kernel: [    5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>> kernel: [    5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>> kernel: [    5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>> kernel: [    5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>> kernel: [    5.914410] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>> kernel: [    5.914414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> kernel: [    5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>> kernel: [    5.914419] PKRU: 55555554


More information about the amd-gfx mailing list