[PATCH] drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Fri Oct 25 14:40:00 UTC 2024
On 25/10/2024 15:23, Mario Limonciello wrote:
> On 10/25/2024 09:15, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>> KASAN reports that the GPU metrics table allocated in
>> vangogh_tables_init() is not large enough for the memset done in
>> smu_cmn_init_soft_gpu_metrics(). Condensed report follows:
>>
>> [ 33.861314] BUG: KASAN: slab-out-of-bounds in
>> smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
>> [ 33.861799] Write of size 168 at addr ffff888129f59500 by task
>> mangoapp/1067
>> ...
>> [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted:
>> G W 6.12.0-rc4 #356
>> 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
>> [ 33.861816] Tainted: [W]=WARN
>> [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107
>> 12/01/2023
>> [ 33.861822] Call Trace:
>> [ 33.861826] <TASK>
>> [ 33.861829] dump_stack_lvl+0x66/0x90
>> [ 33.861838] print_report+0xce/0x620
>> [ 33.861853] kasan_report+0xda/0x110
>> [ 33.862794] kasan_check_range+0xfd/0x1a0
>> [ 33.862799] __asan_memset+0x23/0x40
>> [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu
>> 13b1bc364ec578808f676eba412c20eaab792779]
>> [ 33.867135] dev_attr_show+0x43/0xc0
>> [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0
>> [ 33.867155] seq_read_iter+0x3f8/0x1140
>> [ 33.867173] vfs_read+0x76c/0xc50
>> [ 33.867198] ksys_read+0xfb/0x1d0
>> [ 33.867214] do_syscall_64+0x90/0x160
>> ...
>> [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
>> [ 33.867358] kasan_save_stack+0x33/0x50
>> [ 33.867364] kasan_save_track+0x17/0x60
>> [ 33.867367] __kasan_kmalloc+0x87/0x90
>> [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
>> [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu]
>> [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu]
>> [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
>> [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
>> [ 33.869608] local_pci_probe+0xda/0x180
>> [ 33.869614] pci_device_probe+0x43f/0x6b0
>>
>> Empirically we can confirm that the former allocates 152 bytes for the
>> table, while the latter memsets the 168 large block.
>>
>> This is somewhat alleviated by the fact that allocation goes into a 192
>> SLAB bucket, but then for v3_0 metrics the table grows to 264 bytes which
>> would definitely be a problem.
>>
>> Root cause appears that when GPU metrics tables for v2_4 parts were added
>> it was not considered to enlarge the table to fit.
>>
>> The fix in this patch is rather "brute force" and perhaps later should be
>> done in a smarter way, by extracting and consolidating the part
>> version to
>> size logic to a common helper, instead of brute forcing the largest
>> possible allocation. Nevertheless, for now this works and fixes the
>> out of
>> bounds write.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to
>> acquire gpu_metrics")
>> Cc: Mario Limonciello <mario.limonciello at amd.com>
>> Cc: Evan Quan <evan.quan at amd.com>
>> Cc: Wenyou Yang <WenYou.Yang at amd.com>
>> Cc: Alex Deucher <alexander.deucher at amd.com>
>> Cc: <stable at vger.kernel.org> # v6.6+
>> ---
>> drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
>> b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
>> index 22737b11b1bf..36f4a4651918 100644
>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
>> @@ -242,7 +242,10 @@ static int vangogh_tables_init(struct smu_context
>> *smu)
>> goto err0_out;
>> smu_table->metrics_time = 0;
>> - smu_table->gpu_metrics_table_size = max(sizeof(struct
>> gpu_metrics_v2_3), sizeof(struct gpu_metrics_v2_2));
>> + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_2);
>> + smu_table->gpu_metrics_table_size =
>> max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_3));
>> + smu_table->gpu_metrics_table_size =
>> max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_4));
>> + smu_table->gpu_metrics_table_size =
>> max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v3_0));
>
> AFAICT Van Gogh only supports 2.2, 2.3 or 2.4. I don't think there is a
> need to compare to 3.0 to solve this bug this way.
Gotcha.
> But generally yeah moving the initialization to a helper that actually
> knows the size would be another way to solve this.
Yeah I was looking at smu_cmn_init_soft_gpu_metrics() and how it has the
data of how large table needs to be for each frev+crev combo. But didn't
go as far as trying to figure out if frev+crev would be available at
table allocation time.
> Or looking at it how about moving all the conditional code in
> vangogh_common_get_gpu_metrics() into vangogh_tables_init() and then
> assigning a vfunc for vangogh_common_get_gpu_metrics() to call?
I did not quite follow. Happy to work on it though and I can look into
it with fresh eyes but not next week, but the week after.
Or in the meantime if you want me to respin this fix just with v3_0 line
removed I can do that today.
Regards,
Tvrtko
>> smu_table->gpu_metrics_table =
>> kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
>> if (!smu_table->gpu_metrics_table)
>> goto err1_out;
>
More information about the amd-gfx
mailing list