[PATCH v3] amd/amdgpu: Fix resv shared fence overflow
Christian König
christian.koenig at amd.com
Mon Oct 12 06:41:58 UTC 2020
Am 12.10.20 um 08:14 schrieb xinhui pan:
> [ 179.556745] kernel BUG at drivers/dma-buf/dma-resv.c:282!
> [snip]
> [ 179.702910] Call Trace:
> [ 179.705696] amdgpu_bo_fence+0x21/0x50 [amdgpu]
> [ 179.710707] amdgpu_vm_sdma_commit+0x299/0x430 [amdgpu]
> [ 179.716497] amdgpu_vm_bo_update_mapping.constprop.0+0x29f/0x390 [amdgpu]
> [ 179.723927] ? find_held_lock+0x38/0x90
> [ 179.728183] amdgpu_vm_handle_fault+0x1af/0x420 [amdgpu]
> [ 179.734063] gmc_v9_0_process_interrupt+0x245/0x2e0 [amdgpu]
> [ 179.740347] ? kgd2kfd_interrupt+0xb8/0x1e0 [amdgpu]
> [ 179.745808] amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
> [ 179.751380] ? amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
> [ 179.757159] amdgpu_ih_process+0xbb/0x1a0 [amdgpu]
> [ 179.762466] amdgpu_irq_handle_ih1+0x27/0x40 [amdgpu]
> [ 179.767997] process_one_work+0x23c/0x580
> [ 179.772371] worker_thread+0x50/0x3b0
> [ 179.776356] ? process_one_work+0x580/0x580
> [ 179.780939] kthread+0x128/0x160
> [ 179.784462] ? kthread_park+0x90/0x90
> [ 179.788466] ret_from_fork+0x1f/0x30
>
> We have two scheduler entities, immediate and delayed.
> So there are two kinds of scheduler finished fences.
> We might add these two fences in root bo resv at same time.
>
> We have reserved the delayed shared fence slot during vm init and bo
> moving.
> But looks like we forget to reserve the immediate shared fence slot
> during vm fault.
>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
Reviewed-by: Christian König <christian.koenig at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3cd949aad500..a737232ceb38 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3352,6 +3352,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> if (!vm)
> goto error_unlock;
>
> + if (dma_resv_reserve_shared(root->tbo.base.resv, 1))
> + goto error_unlock;
> +
> addr /= AMDGPU_GPU_PAGE_SIZE;
> flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> AMDGPU_PTE_SYSTEM;
More information about the amd-gfx
mailing list