[PATCH] drm/amdkfd: Fix shift out-of-bounds issue

Zhang, Jesse(Jie) Jesse.Zhang at amd.com
Thu Oct 19 01:34:22 UTC 2023


[AMD Official Use Only - General]

-----Original Message-----
From: Chen, Xiaogang <Xiaogang.Chen at amd.com>
Sent: Thursday, October 19, 2023 12:57 AM
To: Yang, Philip <Philip.Yang at amd.com>; Zhang, Yifan <Yifan1.Zhang at amd.com>; Zhang, Jesse(Jie) <Jesse.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Yang, Philip <Philip.Yang at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Subject: Re: [PATCH] drm/amdkfd: Fix shift out-of-bounds issue


On 10/18/2023 9:53 AM, Philip Yang wrote:
>
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> The 255 granularity is from recent Thunk change to increase CWSR area
> granularity.
>
I think we also need to do same restriction at correspondent parts in Thunk when set range granularity. Setting 255 granularity is far big as granularity here is log value of page number.

Regards

Xiaogang

> Thanks for catching this with kernel debug option
> CC_HAS_UBSAN_ARRAY_BOUNDS enabled. Because 1<<prange->granularity is
> used in many places, I think the proper fix should be in function
> svm_range_apply_attrs
>
>         case KFD_IOCTL_SVM_ATTR_GRANULARITY:
> -            prange->granlarity = attrs[i].value;
>
> +            prange->granlarity = attrs[i].value & 0x3F;
>
> BTW, function svm_range_split_by_granularity() is not used anymore,
> forgot the remove it, maybe you are testing on older source code?
>[Zhang, Jesse(Jie)] Thanks Philip, yes, my code is a bit older and  I will remove the unused code.
        Thanks
        Jesse
> Regards,
>
> Philip
>
> On 2023-10-18 09:36, Zhang, Yifan wrote:
>> [AMD Official Use Only - General]
>>
>> Hi Jesse,
>>
>> This patch is only a WA for the error log. How is this issue reproduced ? 255 looks like an invalid value for a prange->granularity, it is better to root cause who set it in the first place.
>>
>> BRs,
>> Yifan
>>
>> -----Original Message-----
>> From: Jesse Zhang<jesse.zhang at amd.com>
>> Sent: Wednesday, October 18, 2023 3:50 PM
>> To:amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander<Alexander.Deucher at amd.com>; Kuehling,
>> Felix<Felix.Kuehling at amd.com>; Yang, Philip<Philip.Yang at amd.com>;
>> Zhang, Yifan<Yifan1.Zhang at amd.com>; Zhang,
>> Jesse(Jie)<Jesse.Zhang at amd.com>; Zhang,
>> Jesse(Jie)<Jesse.Zhang at amd.com>
>> Subject: [PATCH] drm/amdkfd: Fix shift out-of-bounds issue
>>
>> [  567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned int'
>> [  567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G           OE      6.2.0-34-generic #34~22.04.1-Ubuntu
>> [  567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 09/25/2023 [  567.614504] Workqueue: events send_exception_work_handler [amdgpu] [  567.614748] Call Trace:
>> [  567.614750]  <TASK>
>> [  567.614753]  dump_stack_lvl+0x48/0x70 [  567.614761]
>> dump_stack+0x10/0x20 [  567.614763]
>> __ubsan_handle_shift_out_of_bounds+0x156/0x310
>> [  567.614769]  ? srso_alias_return_thunk+0x5/0x7f [  567.614773]  ?
>> update_sd_lb_stats.constprop.0+0xf2/0x3c0
>> [  567.614780]  svm_range_split_by_granularity.cold+0x2b/0x34
>> [amdgpu] [  567.615047]  ? srso_alias_return_thunk+0x5/0x7f [
>> 567.615052]  svm_migrate_to_ram+0x185/0x4d0 [amdgpu] [  567.615286]
>> do_swap_page+0x7b6/0xa30 [  567.615291]  ?
>> srso_alias_return_thunk+0x5/0x7f [  567.615294]  ?
>> __free_pages+0x119/0x130 [  567.615299]  handle_pte_fault+0x227/0x280
>> [  567.615303]  __handle_mm_fault+0x3c0/0x720 [  567.615311]
>> handle_mm_fault+0x119/0x330 [  567.615314]  ?
>> lock_mm_and_find_vma+0x44/0x250 [  567.615318]
>> do_user_addr_fault+0x1a9/0x640 [  567.615323]
>> exc_page_fault+0x81/0x1b0 [  567.615328]
>> asm_exc_page_fault+0x27/0x30 [  567.615332] RIP:
>> 0010:__get_user_8+0x1c/0x30
>>
>> Signed-off-by: Jesse Zhang<Jesse.Zhang at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 7b81233bc9ae..f5e0bccc6d71 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1169,7 +1169,7 @@ svm_range_split_by_granularity(struct kfd_process *p, struct mm_struct *mm,
>>           * PTE will be used for whole range, this reduces the number of PTE
>>           * updated and the L1 TLB space used for translation.
>>           */
>> -       size = 1UL << prange->granularity;
>> +       size = 1UL << (prange->granularity & 0x3f);
>>          start = ALIGN_DOWN(addr, size);
>>          last = ALIGN(addr + 1, size) - 1;
>>
>> --
>> 2.25.1
>>


More information about the amd-gfx mailing list