[PATCH v2] amd/amdgpu: Fix resv shared fence overflow

Christian König christian.koenig at amd.com
Tue Sep 29 07:00:36 UTC 2020


Philip already stumbled over this issue as well, but this is the wrong 
place to fix this.

dma_resv_reserve_shared() needs to be called after we reserved the page 
tables and before we do the update in amdgpu_vm_handle_fault().

Reserved slots are freed (in a debug build) as soon as we release the 
reservation.

Christian.

Am 29.09.20 um 07:57 schrieb xinhui pan:
> [  179.556745] kernel BUG at drivers/dma-buf/dma-resv.c:282!
> [snip]
> [  179.702910] Call Trace:
> [  179.705696]  amdgpu_bo_fence+0x21/0x50 [amdgpu]
> [  179.710707]  amdgpu_vm_sdma_commit+0x299/0x430 [amdgpu]
> [  179.716497]  amdgpu_vm_bo_update_mapping.constprop.0+0x29f/0x390 [amdgpu]
> [  179.723927]  ? find_held_lock+0x38/0x90
> [  179.728183]  amdgpu_vm_handle_fault+0x1af/0x420 [amdgpu]
> [  179.734063]  gmc_v9_0_process_interrupt+0x245/0x2e0 [amdgpu]
> [  179.740347]  ? kgd2kfd_interrupt+0xb8/0x1e0 [amdgpu]
> [  179.745808]  amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
> [  179.751380]  ? amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
> [  179.757159]  amdgpu_ih_process+0xbb/0x1a0 [amdgpu]
> [  179.762466]  amdgpu_irq_handle_ih1+0x27/0x40 [amdgpu]
> [  179.767997]  process_one_work+0x23c/0x580
> [  179.772371]  worker_thread+0x50/0x3b0
> [  179.776356]  ? process_one_work+0x580/0x580
> [  179.780939]  kthread+0x128/0x160
> [  179.784462]  ? kthread_park+0x90/0x90
> [  179.788466]  ret_from_fork+0x1f/0x30
>
> We have two scheduler entities, immediate and delayed.
> So there are two kinds of scheduler finished fences.
> We might add these two fences in root bo resv at same time while we
> only reserve one slot.
>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 37221b99ca96..9e0116c7f8d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2869,7 +2869,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   	if (r)
>   		goto error_free_root;
>   
> -	r = dma_resv_reserve_shared(root->tbo.base.resv, 1);
> +	r = dma_resv_reserve_shared(root->tbo.base.resv, 2);
>   	if (r)
>   		goto error_unreserve;
>   



More information about the amd-gfx mailing list