[PATCH] drm/amdkfd: Fix an illegal memory access

Felix Kuehling felix.kuehling at amd.com
Tue Feb 21 16:26:37 UTC 2023


On 2023-02-21 06:35, qu.huang at linux.dev wrote:
> From: Qu Huang <qu.huang at linux.dev>
>
> In the kfd_wait_on_events() function, the kfd_event_waiter structure is
> allocated by alloc_event_waiters(), but the event field of the waiter
> structure is not initialized; When copy_from_user() fails in the
> kfd_wait_on_events() function, it will enter exception handling to
> release the previously allocated memory of the waiter structure;
> Due to the event field of the waiters structure being accessed
> in the free_waiters() function, this results in illegal memory access
> and system crash, here is the crash log:
>
> localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
> localhost kernel: RSP: 0018:ffffaa53c362bd60 EFLAGS: 00010082
> localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0000000000000282 RCX: 00000000002c0000
> localhost kernel: RDX: ffff9e855eeacb80 RSI: 000000000000279c RDI: ffffe7088f6a21d0
> localhost kernel: RBP: ffffe7088f6a21d0 R08: 00000000002c0000 R09: ffffaa53c362be64
> localhost kernel: R10: ffffaa53c362bbd8 R11: 0000000000000001 R12: 0000000000000002
> localhost kernel: R13: ffff9e7ead15d600 R14: 0000000000000000 R15: ffff9e7ead15d698
> localhost kernel: FS:  0000152a3d111700(0000) GS:ffff9e855ee80000(0000) knlGS:0000000000000000
> localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> localhost kernel: CR2: 0000152938000010 CR3: 000000044d7a4000 CR4: 00000000003506e0
> localhost kernel: Call Trace:
> localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
> localhost kernel: remove_wait_queue+0x12/0x50
> localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
> localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
> localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
> localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
> localhost kernel: __x64_sys_ioctl+0x8e/0xd0
> localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
> localhost kernel: do_syscall_64+0x33/0x80
> localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
> localhost kernel: RIP: 0033:0x152a4dff68d7
>
> Signed-off-by: Qu Huang <qu.huang at linux.dev>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 729d26d..e5faaad 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -787,6 +787,7 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events)
>   	for (i = 0; (event_waiters) && (i < num_events) ; i++) {
>   		init_wait(&event_waiters[i].wait);
>   		event_waiters[i].activated = false;
> +		event_waiters[i].event = NULL;

Thank you for catching this. We're often lazy about initializing things 
to NULL or 0 because most of our data structures are allocated with 
kzalloc or similar. I'm not sure why we're not doing this here. If we 
allocated event_waiters with kcalloc, we could also remove the 
initialization of activated. I think that would be the cleaner and safer 
solution.

Regards,
   Felix


>   	}
>
>   	return event_waiters;
> --
> 1.8.3.1


More information about the amd-gfx mailing list