[Intel-xe] [PATCH 3/3] drm/xe/bo: sync kernel fences for KMD buffers

Matthew Brost matthew.brost at intel.com
Wed Oct 25 21:00:52 UTC 2023


On Wed, Oct 25, 2023 at 06:39:41PM +0100, Matthew Auld wrote:
> With things like pipelined evictions, VRAM pages can be marked as free
> and yet still have some active kernel fences, with the idea that the
> next caller to allocate the memory will respect them. However it looks
> like we are missing synchronisation for KMD internal buffers, like
> page-tables, lrc etc. For userspace objects we should already have the
> required serialisation for CPU access via the fault handler, and
> likewise for GPU access when vm_binding them.
> 
> To fix this serialise against any kernel fences for all KMD objects at
> creation. This seems to resolve some severe corruption seen during
> evictions.
> 
> Closes: ?
> Testcase: igt at xe-evict-ccs
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 61789c0e88fb..26a103aa5d48 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1272,6 +1272,16 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  	else
>  		ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>  
> +	/*
> +	 * TTM can give us VRAM that still has active fences i.e GPU might still
> +	 * be accessing it. To keep things simple just sync any kernel fences
> +	 * here, if the buffer is KMD internal. For normal userspace objects we
> +	 * should already have the required pipelining or sync waiting.
> +	 */
> +	if (type == ttm_bo_type_kernel)
> +		dma_resv_wait_timeout(bo->ttm.base.resv, DMA_RESV_USAGE_KERNEL,
> +				      false, MAX_SCHEDULE_TIMEOUT);
> +

So this is the case where we create a kernel BO within a VM (e.g. memory
for page tables) and the memory we received is also within the same VM
(bo->ttm.base.resv == VM resv object)? The memory we received could
still be being used thus we need to wait for the memory to be idle? Is
that right? Also how did you arrive at the DMA_RESV_USAGE_KERNEL slot to
determine idle?

Matt 

>  	return bo;
>  }
>  
> -- 
> 2.41.0
> 


More information about the Intel-xe mailing list