[PATCH] drm/amdkfd: Fix shift out-of-bounds issue

Chen, Xiaogang xiaogang.chen at amd.com
Wed Oct 18 16:57:12 UTC 2023


On 10/18/2023 9:53 AM, Philip Yang wrote:
>
> 	
> Caution: This message originated from an External Source. Use proper 
> caution when opening attachments, clicking links, or responding.
>
>
> The 255 granularity is from recent Thunk change to increase CWSR area 
> granularity.
>
I think we also need to do same restriction at correspondent parts in 
Thunk when set range granularity. Setting 255 granularity is far big as 
granularity here is log value of page number.

Regards

Xiaogang

> Thanks for catching this with kernel debug option 
> CC_HAS_UBSAN_ARRAY_BOUNDS enabled. Because 1<<prange->granularity is 
> used in many places, I think the proper fix should be in function 
> svm_range_apply_attrs
>
>         case KFD_IOCTL_SVM_ATTR_GRANULARITY:
> -            prange->granlarity = attrs[i].value;
>
> +            prange->granlarity = attrs[i].value & 0x3F;
>
> BTW, function svm_range_split_by_granularity() is not used anymore, 
> forgot the remove it, maybe you are testing on older source code?
>
> Regards,
>
> Philip
>
> On 2023-10-18 09:36, Zhang, Yifan wrote:
>> [AMD Official Use Only - General]
>>
>> Hi Jesse,
>>
>> This patch is only a WA for the error log. How is this issue reproduced ? 255 looks like an invalid value for a prange->granularity, it is better to root cause who set it in the first place.
>>
>> BRs,
>> Yifan
>>
>> -----Original Message-----
>> From: Jesse Zhang<jesse.zhang at amd.com>
>> Sent: Wednesday, October 18, 2023 3:50 PM
>> To:amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander<Alexander.Deucher at amd.com>; Kuehling, Felix<Felix.Kuehling at amd.com>; Yang, Philip<Philip.Yang at amd.com>; Zhang, Yifan<Yifan1.Zhang at amd.com>; Zhang, Jesse(Jie)<Jesse.Zhang at amd.com>; Zhang, Jesse(Jie)<Jesse.Zhang at amd.com>
>> Subject: [PATCH] drm/amdkfd: Fix shift out-of-bounds issue
>>
>> [  567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned int'
>> [  567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G           OE      6.2.0-34-generic #34~22.04.1-Ubuntu
>> [  567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 09/25/2023 [  567.614504] Workqueue: events send_exception_work_handler [amdgpu] [  567.614748] Call Trace:
>> [  567.614750]  <TASK>
>> [  567.614753]  dump_stack_lvl+0x48/0x70 [  567.614761]  dump_stack+0x10/0x20 [  567.614763]  __ubsan_handle_shift_out_of_bounds+0x156/0x310
>> [  567.614769]  ? srso_alias_return_thunk+0x5/0x7f [  567.614773]  ? update_sd_lb_stats.constprop.0+0xf2/0x3c0
>> [  567.614780]  svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu] [  567.615047]  ? srso_alias_return_thunk+0x5/0x7f [  567.615052]  svm_migrate_to_ram+0x185/0x4d0 [amdgpu] [  567.615286]  do_swap_page+0x7b6/0xa30 [  567.615291]  ? srso_alias_return_thunk+0x5/0x7f [  567.615294]  ? __free_pages+0x119/0x130 [  567.615299]  handle_pte_fault+0x227/0x280 [  567.615303]  __handle_mm_fault+0x3c0/0x720 [  567.615311]  handle_mm_fault+0x119/0x330 [  567.615314]  ? lock_mm_and_find_vma+0x44/0x250 [  567.615318]  do_user_addr_fault+0x1a9/0x640 [  567.615323]  exc_page_fault+0x81/0x1b0 [  567.615328]  asm_exc_page_fault+0x27/0x30 [  567.615332] RIP: 0010:__get_user_8+0x1c/0x30
>>
>> Signed-off-by: Jesse Zhang<Jesse.Zhang at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 7b81233bc9ae..f5e0bccc6d71 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1169,7 +1169,7 @@ svm_range_split_by_granularity(struct kfd_process *p, struct mm_struct *mm,
>>           * PTE will be used for whole range, this reduces the number of PTE
>>           * updated and the L1 TLB space used for translation.
>>           */
>> -       size = 1UL << prange->granularity;
>> +       size = 1UL << (prange->granularity & 0x3f);
>>          start = ALIGN_DOWN(addr, size);
>>          last = ALIGN(addr + 1, size) - 1;
>>
>> --
>> 2.25.1
>>


More information about the amd-gfx mailing list