[PATCH] drm/amdkfd: Fix the shift-out-of-bounds warning
Felix Kuehling
felix.kuehling at amd.com
Thu Jan 11 16:03:23 UTC 2024
[+Jon]
On 2024-01-11 01:05, Ma, Jun wrote:
> Hi Felix,
>
> On 1/10/2024 11:57 PM, Felix Kuehling wrote:
>> On 2024-01-10 04:39, Ma Jun wrote:
>>> There is following shift-out-of-bounds warning if ecode=0.
>>> "shift exponent 4294967295 is too large for 64-bit type 'long long unsigned int'"
>>>
>>> Signed-off-by: Ma Jun <Jun.Ma2 at amd.com>
>>> ---
>>> include/uapi/linux/kfd_ioctl.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
>>> index 2aa88afe305b..129325b02a91 100644
>>> --- a/include/uapi/linux/kfd_ioctl.h
>>> +++ b/include/uapi/linux/kfd_ioctl.h
>>> @@ -1004,7 +1004,7 @@ enum kfd_dbg_trap_exception_code {
>>> };
>>>
>>> /* Mask generated by ecode in kfd_dbg_trap_exception_code */
>>> -#define KFD_EC_MASK(ecode) (1ULL << (ecode - 1))
>>> +#define KFD_EC_MASK(ecode) (BIT(ecode) - 1)
>> This is not the same thing. We want a bit mask with one bit set. And
>> ecode=1 should set bit 0. ecode=0 is not a valid code and doesn't have a
>> valid mask. You could use BIT((ecode) - 1), but I think that would give
>> you the same warning for ecode=0. I also don't see BIT defined anywhere
>> under include/uapi, so I think using this in the API header would break
>> the build in user mode.
>>
>> Where are you seeing the warning about the bad shift exponent? Looks
>> like someone is using the KFD_EC_MASK macro incorrectly. Or if there is
>> a legitimate use of it with ecode=0, then the correct fix would be
>>
> This warning is caused by following code in function event_interrupt_wq_v10()
>
> else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) {
> kfd_set_dbg_ev_from_interrupt(dev, pasid,
> KFD_DEBUG_DOORBELL_ID(context_id0),
> KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)),
> NULL,
> 0);
> }
This looks OK. The compiler must be warning about a potential problem
here, not a definite one.
Question for Jon, how does the firmware encode the error code in the
context ID? I see these macros:
#define KFD_DEBUG_CP_BAD_OP_ECODE_MASK 0x3fffc00
#define KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT 10
#define KFD_DEBUG_CP_BAD_OP_ECODE(ctxid0) (((ctxid0) & \
KFD_DEBUG_CP_BAD_OP_ECODE_MASK) \
>> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT)
It looks like we have 16 bits for the ECODE. That's enough to have a bit
mask. Do we really need KFD_EC_MASK to convert an error number into a
bitmask here?
>
>
>> #define KFD_EC_MASK(ecode) ((ecode) ? 1ULL << (ecode - 1) : 0ULL)
> This can fix the warning.
In the code above, if ecode is 0, that would lead to calling
kfd_set_dbg_ev_from_interrupt with a event mask of 0. Not sure if that
even makes sense. Jon, so we need special handling of cases where the
error code is 0 or out of range, so we can warn about buggy firmware
rather than creating nonsensical events for the debugger?
Regards,
Felix
>
> Regards
> Ma Jun
>> Regards,
>> Felix
>>
>>
>>>
>>> /* Masks for exception code type checks below */
>>> #define KFD_EC_MASK_QUEUE (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \
More information about the amd-gfx
mailing list