[PATCH] drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Jan 9 13:40:45 UTC 2023
Am 09.01.23 um 14:13 schrieb Mikhail Gavrilov:
> On Fri, Jan 6, 2023 at 8:27 PM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>>
>> And it looks like Dmitry submitted it initially to the wrong branch.
>>
>> Because of this it wasn't scheduled as fix for 6.2, but rather queued up
>> as new feature for 6.3.
>>
>> This is fixed by now and the patch should show up in the next -rc.
>>
>> Regards,
>> Christian.
>>
> Hi,
> Not sure related to this patch but I caught kernel oops this weekend.
> Reproducing is too hard. I don't know which actions need to be taken.
> but I'm definitely sure that this is happening when I launch
> "Cyberpunk 2077", Google Chrome with a huge amount of opened windows
> and tabs should be launched too.
> But even two described conditions is not enough.
> In a way that is not entirely clear to me, a memory leak should occur.
That looks like an out of memory situation is not gracefully handled.
In other words we have a missing NULL check in drm_sched_job_cleanup().
Going to take a look.
Thanks,
Christian.
>
> The trace looks like:
> BUG: kernel NULL pointer dereference, address: 0000000000000078
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 39818f067 P4D 39818f067 PUD 35bbd6067 PMD 4f8438067 PTE 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 21 PID: 100830 Comm: GameThread Tainted: G W L
> ------- --- 6.2.0-0.rc2.20230105git41c03ba9beea.20.fc38.x86_64 #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 4408 10/28/2022
> RIP: 0010:drm_sched_job_cleanup+0x1a/0x110 [gpu_sched]
> Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00
> 55 53 48 89 fb 48 83 ec 08 48 8b 7f 20 48 c7 04 24 00 00 00 00 <8b> 47
> 78 85 c0 0f 84 b5 00 00 00 48 83 ff c0 74 1f 48 8d 57 78 b8
> RSP: 0018:ffffae3e16c0b9d0 EFLAGS: 00010282
> RAX: 0000000000000001 RBX: ffff91de6f7bc000 RCX: 00000000012a8976
> RDX: 0000000000000000 RSI: ffffffffadbda69b RDI: 0000000000000000
> RBP: ffff91de6f7bc000 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffffff
> R13: 0000000000000018 R14: ffff91e259275000 R15: 0000000000000001
> FS: 000000007bcff6c0(0000) GS:ffff91e667e00000(0000) knlGS:000000007abe0000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000078 CR3: 0000000297a24000 CR4: 0000000000350ee0
> Call Trace:
> <TASK>
> amdgpu_job_free+0x1d/0x120 [amdgpu]
> amdgpu_cs_parser_fini+0x119/0x170 [amdgpu]
> amdgpu_cs_ioctl+0x3f4/0x2000 [amdgpu]
> ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
> drm_ioctl_kernel+0xac/0x160
> drm_ioctl+0x1e7/0x450
> ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
> amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> __x64_sys_ioctl+0x90/0xd0
> do_syscall_64+0x5b/0x80
> ? do_syscall_64+0x67/0x80
> ? lock_is_held_type+0xe8/0x140
> ? asm_sysvec_call_function+0x16/0x20
> ? lockdep_hardirqs_on+0x7d/0x100
> entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7fe30905e65f
> Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48
> 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2
> 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> RSP: 002b:000000007bcfd410 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 000000007bcfd738 RCX: 00007fe30905e65f
> RDX: 000000007bcfd520 RSI: 00000000c0186444 RDI: 00000000000000b6
> RBP: 000000007bcfd520 R08: 00007fe2800a6b80 R09: 000000007bcfd4b0
> R10: 000000007e22b350 R11: 0000000000000246 R12: 00000000c0186444
> R13: 00000000000000b6 R14: 000000000000000d R15: 00007fe2800a6ab0
> </TASK>
> Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer netconsole
> nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
> sunrpc binfmt_misc mt76x2u mt76x2_common mt76x02_usb mt76_usb iwlmvm
> mt76x02_lib mt76 mac80211 btusb iwlwifi libarc4 btrtl btbcm btintel
> btmtk hid_logitech_hidpp xpad bluetooth cfg80211 ff_memless joydev
> intel_rapl_msr intel_rapl_common edac_mce_amd eeepc_wmi
> snd_hda_codec_realtek kvm_amd asus_wmi snd_hda_codec_generic
> snd_seq_midi snd_seq_midi_event ledtrig_audio vfat asus_ec_sensors kvm
> sparse_keymap platform_profile snd_hda_codec_hdmi fat snd_usb_audio
> snd_hda_intel snd_intel_dspcfg snd_usbmidi_lib snd_intel_sdw_acpi
> irqbypass snd_rawmidi snd_hda_codec rapl rfkill mc snd_hda_core
> wmi_bmof pcspkr i2c_piix4 k10temp snd_hwdep snd_seq snd_seq_device
> [19447.812785] snd_pcm acpi_cpufreq hid_logitech_dj snd_timer snd
> soundcore zram amdgpu drm_ttm_helper ttm video crct10dif_pclmul
> iommu_v2 crc32_pclmul crc32c_intel drm_buddy polyval_clmulni gpu_sched
> polyval_generic igb drm_display_helper nvme ucsi_ccg typec_ucsi
> ghash_clmulni_intel ccp typec sha512_ssse3 nvme_core cec sp5100_tco
> dca nvme_common wmi ip6_tables ip_tables fuse
> CR2: 0000000000000078
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:drm_sched_job_cleanup+0x1a/0x110 [gpu_sched]
> Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00
> 55 53 48 89 fb 48 83 ec 08 48 8b 7f 20 48 c7 04 24 00 00 00 00 <8b> 47
> 78 85 c0 0f 84 b5 00 00 00 48 83 ff c0 74 1f 48 8d 57 78 b8
> RSP: 0018:ffffae3e16c0b9d0 EFLAGS: 00010282
> RAX: 0000000000000001 RBX: ffff91de6f7bc000 RCX: 00000000012a8976
> RDX: 0000000000000000 RSI: ffffffffadbda69b RDI: 0000000000000000
> RBP: ffff91de6f7bc000 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffffff
> R13: 0000000000000018 R14: ffff91e259275000 R15: 0000000000000001
> FS: 000000007bcff6c0(0000) GS:ffff91e667e00000(0000) knlGS:000000007abe0000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000078 CR3: 0000000297a24000 CR4: 0000000000350ee0
>
>
More information about the amd-gfx
mailing list