[PATCH] drm/amdgpu: Fix vm free pts race when process exiting

Christian König ckoenig.leichtzumerken at gmail.com
Wed Aug 4 09:01:21 UTC 2021


Am 03.08.21 um 00:33 schrieb Philip Yang:
> Take vm->invalidated_lock spinlock to remove vm pd and pt bos, to avoid
> link list corruption with crash backtrace:
>
> [ 2290.505111] list_del corruption. next->prev should be
>   ffff9b2514ec0018, but was 4e03280211010f04
> [ 2290.505154] kernel BUG at lib/list_debug.c:56!
> [ 2290.505176] invalid opcode: 0000 [#1] SMP NOPTI
> [ 2290.505254] Workqueue: events delayed_fput
> [ 2290.505271] RIP: 0010:__list_del_entry_valid.cold.1+0x20/0x4c
> [ 2290.505513] Call Trace:
> [ 2290.505623]  amdgpu_vm_free_table+0x26/0x80 [amdgpu]
> [ 2290.505705]  amdgpu_vm_free_pts+0x7a/0xf0 [amdgpu]
> [ 2290.505786]  amdgpu_vm_fini+0x1f0/0x440 [amdgpu]
> [ 2290.505864]  amdgpu_driver_postclose_kms+0x172/0x290 [amdgpu]
> [ 2290.505893]  drm_file_free.part.10+0x1d4/0x270 [drm]
> [ 2290.505916]  drm_release+0xa9/0xe0 [drm]
> [ 2290.505930]  __fput+0xb7/0x230
> [ 2290.505942]  delayed_fput+0x1c/0x30
> [ 2290.505957]  process_one_work+0x1a7/0x360
> [ 2290.505971]  worker_thread+0x30/0x390
> [ 2290.505985]  ? create_worker+0x1a0/0x1a0
> [ 2290.505999]  kthread+0x112/0x130
> [ 2290.506011]  ? kthread_flush_work_fn+0x10/0x10
> [ 2290.506027]  ret_from_fork+0x22/0x40

Wow, well this is a big NAK.

The page tables should never ever be on the invalidation list or 
otherwise we would try to point PTEs to them which is a huge security issue.

Taking the lock just workaround that. Can you investigate how it happens 
that a page table ends up on that list?

Thanks in advance,
Christian.

>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 2a88ed5d983b..5c4c355e7d6b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1045,7 +1045,9 @@ static void amdgpu_vm_free_table(struct amdgpu_vm_bo_base *entry)
>   		return;
>   	shadow = amdgpu_bo_shadowed(entry->bo);
>   	entry->bo->vm_bo = NULL;
> +	spin_lock(&entry->vm->invalidated_lock);
>   	list_del(&entry->vm_status);
> +	spin_unlock(&entry->vm->invalidated_lock);
>   	amdgpu_bo_unref(&shadow);
>   	amdgpu_bo_unref(&entry->bo);
>   }



More information about the amd-gfx mailing list