Radeon RX 6800 does not work properly on Think-Force 7140 ARM server (generates Oops and causes system deadlock)

Lang Yu Lang.Yu at amd.com
Fri Jun 9 08:37:25 UTC 2023


Try to disable CONFIG_HSA_AMD_SVM in your kernel config.

Regards,
Lang

On 06/09/ , 彭逸豪 wrote:
> I have a Radeon RX 6800 and want to use it on my ARM server (Hardware name: Think-Force Technology Universal Server/7140 Advanced, BIOS 1.1.7 20230216). However the presence of this GPU can cause kernel Oops or panic (some older versions). Even if the kernel does not panic, the system will fall into a "deadlock" state and cannot log in normally.
> 
> Below is the Oops of 6.4.0-rc5 (the full log from the serial port is attached). After that, the GPU cannot be used normally, and the system is stuck in a "deadlock" state, and cannot log in normally after entering the user name. If the GPU is removed or replaced with another GPU such as an older Radeon RX 560, the system can log in normally. Radeon RX 560 works fine in 6.4.0-rc5.
> 
> I have tried multiple versions of the kernel, from 5.15 to 6.4.0-rc5, they all have similar Oops or panic, and the GPU cannot be used, and the system cannot be logged in normally.
> 
> The attachment contains the full kernel log captured from the serial port, and my 6.4-rc5 config file. Please let me know if additional information is needed.
> 
> (Note: The previous email did not have the correct subject, so I retracted it. I am sorry if you have received duplicate emails.)
> 
> [    6.535108] cma: cma_alloc: reserved: alloc failed, req-size: 2 pages, ret: -12
> [    9.824070] Unable to handle kernel paging request at virtual address ffffffffffe00034
> [    9.831955] Mem abort info:
> [    9.834737]   ESR = 0x0000000096000046
> [    9.838469]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    9.843756]   SET = 0, FnV = 0
> [    9.846794]   EA = 0, S1PTW = 0
> [    9.849919]   FSC = 0x06: level 2 translation fault
> [    9.854773] Data abort info:
> [    9.857638]   ISV = 0, ISS = 0x00000046
> [    9.861454]   CM = 0, WnR = 1
> [    9.864405] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001ff5489000
> [    9.871074] [ffffffffffe00034] pgd=10000001b078b003, p4d=10000001b078b003, pud=10000001b078a003, pmd=0000000000000000
> [    9.881637] Internal error: Oops: 0000000096000046 [#1] SMP
> [    9.887180] Modules linked in: input_leds hid_generic amdgpu(+) usbhid hid cdc_ether usbnet snd_hda_codec_hdmi binfmt_misc snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core gpu_sched drm_buddy video snd_hwdep drm_suballoc_helper drm_ttm_helper snd_pcm ttm onboard_usb_hub nls_iso8859_1 drm_display_helper snd_seq_midi snd_seq_midi_event ast snd_rawmidi cec rc_core snd_seq drm_shmem_helper drm_kms_helper snd_seq_device snd_timer snd ipmi_ssif ipmi_devintf syscopyarea crct10dif_ce sysfillrect ipmi_msghandler soundcore arm_spe_pmu sysimgblt sch_fq_codel drm pstore_blk ramoops reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 nvme igb nvme_core nvme_common i2c_algo_bit xhci_plat_hcd
> [    9.948668] CPU: 0 PID: 305 Comm: kworker/0:2 Tainted: G        W          6.4.0-rc5 #1
> [    9.956630] Hardware name: Think-Force Technology Universal Server/7140 Advanced, BIOS 1.1.7 20230216
> [    9.965801] Workqueue: events work_for_cpu_fn
> [    9.970137] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    9.977060] pc : __init_zone_device_page (/home/ubuntu/kernel-6.4-rc5/./include/linux/atomic/atomic-instrumented.h:42 /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:99 /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:115 /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:557 /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:966) 
> [    9.981826] lr : memmap_init_zone_device (/home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:1084) 
> [    9.986677] sp : ffff80000aea3980
> [    9.989971] x29: ffff80000aea3980 x28: 0000000000000000 x27: 0000000fffff8000
> [    9.997068] x26: ffff80000a8c5f70 x25: ffff0001c118d6a0 x24: fffffc0000000000
> [   10.004165] x23: 0000001000000000 x22: ffff800009bc5e98 x21: 0000000000000001
> [   10.011262] x20: 0000000000000001 x19: ffffffffffe00000 x18: 0000000000000000
> [   10.018360] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [   10.025457] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [   10.032554] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000094032cc
> [   10.039651] x8 : 0000000000000000 x7 : 00000000ffffffff x6 : 0000000000000001
> [   10.046748] x5 : 0000000000000000 x4 : ffff0001c118d6a0 x3 : 0000000000000000
> [   10.053845] x2 : 0200000000000000 x1 : 0000000fffff8000 x0 : ffffffffffe00000
> [   10.060943] Call trace:
> [   10.063373] __init_zone_device_page (/home/ubuntu/kernel-6.4-rc5/./include/linux/atomic/atomic-instrumented.h:42 /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:99 /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:115 /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:557 /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:966) 
> [   10.067791] memmap_init_zone_device (/home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:1084) 
> [   10.072297] memremap_pages (/home/ubuntu/kernel-6.4-rc5/mm/memremap.c:270 /home/ubuntu/kernel-6.4-rc5/mm/memremap.c:366) 
> [   10.076111] devm_memremap_pages (/home/ubuntu/kernel-6.4-rc5/mm/memremap.c:407) 
> [   10.080183] svm_migrate_init (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:1029) amdgpu
> [   10.085112] kgd2kfd_device_init (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:647) amdgpu
> [   10.090318] amdgpu_amdkfd_device_init (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:197) amdgpu
> [   10.096039] amdgpu_device_init (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2537 /home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3871) amdgpu
> [   10.101329] amdgpu_driver_load_kms (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
> [   10.106704] amdgpu_pci_probe (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2149) amdgpu
> [   10.111648] local_pci_probe (/home/ubuntu/kernel-6.4-rc5/drivers/pci/pci-driver.c:325) 
> [   10.115462] work_for_cpu_fn (/home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:5370) 
> [   10.119190] process_one_work (/home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2410) 
> [   10.123177] worker_thread (/home/ubuntu/kernel-6.4-rc5/./include/linux/list.h:292 /home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2465 /home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2557) 
> [   10.126905] kthread (/home/ubuntu/kernel-6.4-rc5/kernel/kthread.c:379) 
> [   10.130114] ret_from_fork (/home/ubuntu/kernel-6.4-rc5/arch/arm64/kernel/entry.S:871) 
> [ 10.133670] Code: 910003fd a90153f3 12800007 d3490842 (b9003406)
> All code
> ========
>    0:	910003fd 	mov	x29, sp
>    4:	a90153f3 	stp	x19, x20, [sp, #16]
>    8:	12800007 	mov	w7, #0xffffffff            	// #-1
>    c:	d3490842 	ubfiz	x2, x2, #55, #3
>   10:*	b9003406 	str	w6, [x0, #52]		<-- trapping instruction
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	b9003406 	str	w6, [x0, #52]
> [   10.139730] ---[ end trace 0000000000000000 ]---





More information about the amd-gfx mailing list