[PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
Kim, Jonathan
Jonathan.Kim at amd.com
Thu Feb 29 15:58:28 UTC 2024
[Public]
I think this was discussed in another thread.
Exception codes should be range checked prior to applying the mask. Raising null events to the debugger or runtime isn't useful.
I haven't gotten around to fixing this yet. I should have time this week.
Just to double check, the out of bounds shift is because of a CP interrupt that generates a null exception code?
Jon
> -----Original Message-----
> From: Jesse Zhang <jesse.zhang at amd.com>
> Sent: Thursday, February 29, 2024 3:45 AM
> To: amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Kuehling, Felix
> <Felix.Kuehling at amd.com>; Kim, Jonathan <Jonathan.Kim at amd.com>;
> Zhang, Yifan <Yifan1.Zhang at amd.com>; Zhang, Jesse(Jie)
> <Jesse.Zhang at amd.com>; Zhang, Jesse(Jie) <Jesse.Zhang at amd.com>
> Subject: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
>
> the issue is :
> [ 388.151802] UBSAN: shift-out-of-bounds in
> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5
> [ 388.151807] shift exponent 4294967295 is too large for 64-bit type 'long
> long unsigned int'
> [ 388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: G E
> 6.7.0+ #1
> [ 388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS
> WS54117N_140 01/16/2024
> [ 388.151816] Workqueue: KFD IH interrupt_wq [amdgpu]
> [ 388.152084] Call Trace:
> [ 388.152086] <TASK>
> [ 388.152089] dump_stack_lvl+0x4c/0x70
> [ 388.152096] dump_stack+0x14/0x20
> [ 388.152098] ubsan_epilogue+0x9/0x40
> [ 388.152101] __ubsan_handle_shift_out_of_bounds+0x113/0x170
> [ 388.152103] ? vprintk+0x40/0x70
> [ 388.152106] ? swsusp_check+0x131/0x190
> [ 388.152110] event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu]
> [ 388.152411] ? raw_spin_rq_unlock+0x14/0x40
> [ 388.152415] ? finish_task_switch+0x85/0x2a0
> [ 388.152417] ? kfifo_copy_out+0x5f/0x70
> [ 388.152420] interrupt_wq+0xb2/0x120 [amdgpu]
> [ 388.152642] ? interrupt_wq+0xb2/0x120 [amdgpu]
> [ 388.152728] process_scheduled_works+0x9a/0x3a0
> [ 388.152731] ? __pfx_worker_thread+0x10/0x10
> [ 388.152732] worker_thread+0x15f/0x2d0
> [ 388.152733] ? __pfx_worker_thread+0x10/0x10
> [ 388.152734] kthread+0xfb/0x130
> [ 388.152735] ? __pfx_kthread+0x10/0x10
> [ 388.152736] ret_from_fork+0x3d/0x60
> [ 388.152738] ? __pfx_kthread+0x10/0x10
> [ 388.152739] ret_from_fork_asm+0x1b/0x30
> [ 388.152742] </TASK>
>
> Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
> ---
> include/uapi/linux/kfd_ioctl.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index 9ce46edc62a5..3d5867df17e8 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code {
> };
>
> /* Mask generated by ecode in kfd_dbg_trap_exception_code */
> -#define KFD_EC_MASK(ecode) (1ULL << (ecode - 1))
> +#define KFD_EC_MASK(ecode) (ecode ? (1ULL << (ecode - 1)) : 0ULL)
>
> /* Masks for exception code type checks below */
> #define KFD_EC_MASK_QUEUE
> (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \
> --
> 2.25.1
More information about the amd-gfx
mailing list