[PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

Kim, Jonathan Jonathan.Kim at amd.com
Thu Feb 29 15:58:28 UTC 2024


[Public]

I think this was discussed in another thread.
Exception codes should be range checked prior to applying the mask.  Raising null events to the debugger or runtime isn't useful.
I haven't gotten around to fixing this yet.  I should have time this week.
Just to double check, the out of bounds shift is because of a CP interrupt that generates a null exception code?

Jon

> -----Original Message-----
> From: Jesse Zhang <jesse.zhang at amd.com>
> Sent: Thursday, February 29, 2024 3:45 AM
> To: amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Kuehling, Felix
> <Felix.Kuehling at amd.com>; Kim, Jonathan <Jonathan.Kim at amd.com>;
> Zhang, Yifan <Yifan1.Zhang at amd.com>; Zhang, Jesse(Jie)
> <Jesse.Zhang at amd.com>; Zhang, Jesse(Jie) <Jesse.Zhang at amd.com>
> Subject: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
>
>  the issue is :
> [  388.151802] UBSAN: shift-out-of-bounds in
> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5
> [  388.151807] shift exponent 4294967295 is too large for 64-bit type 'long
> long unsigned int'
> [  388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: G            E
> 6.7.0+ #1
> [  388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS
> WS54117N_140 01/16/2024
> [  388.151816] Workqueue: KFD IH interrupt_wq [amdgpu]
> [  388.152084] Call Trace:
> [  388.152086]  <TASK>
> [  388.152089]  dump_stack_lvl+0x4c/0x70
> [  388.152096]  dump_stack+0x14/0x20
> [  388.152098]  ubsan_epilogue+0x9/0x40
> [  388.152101]  __ubsan_handle_shift_out_of_bounds+0x113/0x170
> [  388.152103]  ? vprintk+0x40/0x70
> [  388.152106]  ? swsusp_check+0x131/0x190
> [  388.152110]  event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu]
> [  388.152411]  ? raw_spin_rq_unlock+0x14/0x40
> [  388.152415]  ? finish_task_switch+0x85/0x2a0
> [  388.152417]  ? kfifo_copy_out+0x5f/0x70
> [  388.152420]  interrupt_wq+0xb2/0x120 [amdgpu]
> [  388.152642]  ? interrupt_wq+0xb2/0x120 [amdgpu]
> [  388.152728]  process_scheduled_works+0x9a/0x3a0
> [  388.152731]  ? __pfx_worker_thread+0x10/0x10
> [  388.152732]  worker_thread+0x15f/0x2d0
> [  388.152733]  ? __pfx_worker_thread+0x10/0x10
> [  388.152734]  kthread+0xfb/0x130
> [  388.152735]  ? __pfx_kthread+0x10/0x10
> [  388.152736]  ret_from_fork+0x3d/0x60
> [  388.152738]  ? __pfx_kthread+0x10/0x10
> [  388.152739]  ret_from_fork_asm+0x1b/0x30
> [  388.152742]  </TASK>
>
> Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
> ---
>  include/uapi/linux/kfd_ioctl.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index 9ce46edc62a5..3d5867df17e8 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code {
>  };
>
>  /* Mask generated by ecode in kfd_dbg_trap_exception_code */
> -#define KFD_EC_MASK(ecode)   (1ULL << (ecode - 1))
> +#define KFD_EC_MASK(ecode)   (ecode ? (1ULL << (ecode - 1)) : 0ULL)
>
>  /* Masks for exception code type checks below */
>  #define KFD_EC_MASK_QUEUE
>       (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) |     \
> --
> 2.25.1



More information about the amd-gfx mailing list