[PATCH] drm/amd/pm: Vangogh: Fix kernel memory out of bounds write

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Fri Oct 25 14:40:00 UTC 2024


On 25/10/2024 15:23, Mario Limonciello wrote:
> On 10/25/2024 09:15, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>> KASAN reports that the GPU metrics table allocated in
>> vangogh_tables_init() is not large enough for the memset done in
>> smu_cmn_init_soft_gpu_metrics(). Condensed report follows:
>>
>> [   33.861314] BUG: KASAN: slab-out-of-bounds in 
>> smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
>> [   33.861799] Write of size 168 at addr ffff888129f59500 by task 
>> mangoapp/1067
>> ...
>> [   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: 
>> G        W          6.12.0-rc4 #356 
>> 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
>> [   33.861816] Tainted: [W]=WARN
>> [   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 
>> 12/01/2023
>> [   33.861822] Call Trace:
>> [   33.861826]  <TASK>
>> [   33.861829]  dump_stack_lvl+0x66/0x90
>> [   33.861838]  print_report+0xce/0x620
>> [   33.861853]  kasan_report+0xda/0x110
>> [   33.862794]  kasan_check_range+0xfd/0x1a0
>> [   33.862799]  __asan_memset+0x23/0x40
>> [   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [   33.867135]  dev_attr_show+0x43/0xc0
>> [   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
>> [   33.867155]  seq_read_iter+0x3f8/0x1140
>> [   33.867173]  vfs_read+0x76c/0xc50
>> [   33.867198]  ksys_read+0xfb/0x1d0
>> [   33.867214]  do_syscall_64+0x90/0x160
>> ...
>> [   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
>> [   33.867358]  kasan_save_stack+0x33/0x50
>> [   33.867364]  kasan_save_track+0x17/0x60
>> [   33.867367]  __kasan_kmalloc+0x87/0x90
>> [   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
>> [   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
>> [   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
>> [   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
>> [   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
>> [   33.869608]  local_pci_probe+0xda/0x180
>> [   33.869614]  pci_device_probe+0x43f/0x6b0
>>
>> Empirically we can confirm that the former allocates 152 bytes for the
>> table, while the latter memsets the 168 large block.
>>
>> This is somewhat alleviated by the fact that allocation goes into a 192
>> SLAB bucket, but then for v3_0 metrics the table grows to 264 bytes which
>> would definitely be a problem.
>>
>> Root cause appears that when GPU metrics tables for v2_4 parts were added
>> it was not considered to enlarge the table to fit.
>>
>> The fix in this patch is rather "brute force" and perhaps later should be
>> done in a smarter way, by extracting and consolidating the part 
>> version to
>> size logic to a common helper, instead of brute forcing the largest
>> possible allocation. Nevertheless, for now this works and fixes the 
>> out of
>> bounds write.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to 
>> acquire gpu_metrics")
>> Cc: Mario Limonciello <mario.limonciello at amd.com>
>> Cc: Evan Quan <evan.quan at amd.com>
>> Cc: Wenyou Yang <WenYou.Yang at amd.com>
>> Cc: Alex Deucher <alexander.deucher at amd.com>
>> Cc: <stable at vger.kernel.org> # v6.6+
>> ---
>>   drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
>> b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
>> index 22737b11b1bf..36f4a4651918 100644
>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
>> @@ -242,7 +242,10 @@ static int vangogh_tables_init(struct smu_context 
>> *smu)
>>           goto err0_out;
>>       smu_table->metrics_time = 0;
>> -    smu_table->gpu_metrics_table_size = max(sizeof(struct 
>> gpu_metrics_v2_3), sizeof(struct gpu_metrics_v2_2));
>> +    smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_2);
>> +    smu_table->gpu_metrics_table_size = 
>> max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_3));
>> +    smu_table->gpu_metrics_table_size = 
>> max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_4));
>> +    smu_table->gpu_metrics_table_size = 
>> max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v3_0));
> 
> AFAICT Van Gogh only supports 2.2, 2.3 or 2.4.  I don't think there is a 
> need to compare to 3.0 to solve this bug this way.

Gotcha.

> But generally yeah moving the initialization to a helper that actually 
> knows the size would be another way to solve this.

Yeah I was looking at smu_cmn_init_soft_gpu_metrics() and how it has the 
data of how large table needs to be for each frev+crev combo. But didn't 
go as far as trying to figure out if frev+crev would be available at 
table allocation time.

> Or looking at it how about moving all the conditional code in 
> vangogh_common_get_gpu_metrics() into vangogh_tables_init() and then 
> assigning a vfunc for vangogh_common_get_gpu_metrics() to call?

I did not quite follow. Happy to work on it though and I can look into 
it with fresh eyes but not next week, but the week after.

Or in the meantime if you want me to respin this fix just with v3_0 line 
removed I can do that today.

Regards,

Tvrtko

>>       smu_table->gpu_metrics_table = 
>> kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
>>       if (!smu_table->gpu_metrics_table)
>>           goto err1_out;
> 


More information about the amd-gfx mailing list