[PATCH 1/2] drm/xe/guc: Fix out-of-bound while enabling engine activity stats

Michal Wajdeczko michal.wajdeczko at intel.com
Tue Apr 15 08:44:27 UTC 2025



On 15.04.2025 09:29, Riana Tauro wrote:
> Hi Michal
> 
> On 4/15/2025 1:53 AM, Michal Wajdeczko wrote:
>> In the PF mode we allocate array of struct engine_activity_group
>> that holds activity data split for the PF and all potential VFs.
>> But while preparing data for use by VFs we ended with bad index.
>>
>>   [ ] BUG: KASAN: slab-out-of-bounds in
>> xe_guc_engine_activity_function_stats+0x41e/0x4f0 [xe]
>>   [ ] Call Trace:
>>   [ ]  <TASK>
>>   [ ]  dump_stack_lvl+0x91/0xf0
>>   [ ]  print_report+0xd1/0x680
>>   [ ]  ? __virt_addr_valid+0x23a/0x440
>>   [ ]  ? kasan_addr_to_slab+0xd/0xb0
>>   [ ]  kasan_report+0xe7/0x130
>>   [ ]  ? xe_guc_engine_activity_function_stats+0x41e/0x4f0 [xe]
>>   [ ]  ? xe_guc_engine_activity_function_stats+0x41e/0x4f0 [xe]
>>   [ ]  __asan_report_store8_noabort+0x17/0x30
>>   [ ]  xe_guc_engine_activity_function_stats+0x41e/0x4f0 [xe]
>>   [ ]  pf_engine_activity_stats+0x1b6/0x7f0 [xe]
>>   [ ]  ? kobject_put+0x5f/0x470
>>   [ ]  xe_pci_sriov_configure+0x28c9/0x3270 [xe]
>>   [ ]  ? __pfx_dev_attr_store+0x10/0x10
>>   [ ]  ? kstrtoull+0x3b/0x70
>>   [ ]  ? __pfx___lock_acquire+0x10/0x10
>>   [ ]  ? kstrtou16+0x65/0xf0
>>   [ ]  sriov_numvfs_store+0x20c/0x400
>>   [ ]  ? __pfx_sriov_numvfs_store+0x10/0x10
>>   [ ]  ? __pfx__copy_from_iter+0x10/0x10
>>   [ ]  ? __pfx_dev_attr_store+0x10/0x10
>>   [ ]  dev_attr_store+0x3b/0x80
>>   [ ]  ? sysfs_file_ops+0x135/0x190
>>
>> Fixes: 2de3f38fbf89 ("drm/xe: Add support for per-function engine
>> activity")
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>> Cc: Riana Tauro <riana.tauro at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_guc_engine_activity.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_activity.c b/drivers/
>> gpu/drm/xe/xe_guc_engine_activity.c
>> index b96fea78df8b..0fb48f8f05d8 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
>> @@ -304,6 +304,8 @@ static void engine_activity_set_cpu_ts(struct
>> xe_guc *guc, unsigned int index)
>>       struct engine_activity_group *eag = &engine_activity->eag[index];
>>       int i, j;
>>   +    xe_gt_assert(guc_to_gt(guc), index < engine_activity-
>> >num_activity_group);
> This would happen only when num_vfs is greater than total_vfs since
> num_activity_group is total_vfs+1. enable_vfs has that check

true in general

but in this function you can't guarantee that the caller will be doing
such check or it doesn't make any mistake (like it was already done)
while passing the index

OTOH this assert just make sure that passed index fits the eag[] array

> +
>>       for (i = 0; i < GUC_MAX_ENGINE_CLASSES; i++)
>>           for (j = 0; j < GUC_MAX_INSTANCES_PER_CLASS; j++)
>>               eag->engine[i][j].last_cpu_ts = ktime_get();
>> @@ -374,8 +376,9 @@ static int
>> engine_activity_enable_function_stats(struct xe_guc *guc, int num_vfs
>>           return ret;
>>       }
>>   -    for (i = 0; i < engine_activity->num_functions; i++)
>> -        engine_activity_set_cpu_ts(guc, i + 1);
>> +    /* skip PF as it was already setup */
>> +    for (i = 1; i < engine_activity->num_functions; i++)
>> +        engine_activity_set_cpu_ts(guc, i);
> sorry, missed this when removing global.
> Thanks for fixing this. Assert seems unnecessary

I would keep it as it was nicely firing without this fix
and it will be compiled-out on non-debug build, so no harm

> 
> Reviewed-by: Riana Tauro <riana.tauro at intel.com>

thanks

>
>>       return 0;
>>   }



More information about the Intel-xe mailing list