[Intel-xe] [PATCH v3 3/3] drm/xe/bo: sync kernel fences for KMD buffers

Thomas Hellström thomas.hellstrom at linux.intel.com
Mon Oct 30 16:13:52 UTC 2023


On 10/30/23 17:10, Matthew Auld wrote:
> With things like pipelined evictions, VRAM pages can be marked as free
> and yet still have some active kernel fences, with the idea that the
> next caller to allocate the memory will respect them. However it looks
> like we are missing synchronisation for KMD internal buffers, like
> page-tables, lrc etc. For userspace objects we should already have the
> required synchronisation for CPU access via the fault handler, and
> likewise for GPU access when vm_binding them.
>
> To fix this synchronise against any kernel fences for all KMD objects at
> creation. This should resolve some severe corruption seen during
> evictions.
>
> v2 (Matt B):
>    - Revamp the comment explaining this. Also mention why USAGE_KERNEL is
>      correct here.
> v3 (Thomas):
>    - Make sure to use ctx.interruptible for the wait.
>
> Closes: ?
> Testcase: igt at xe-evict-ccs
> Reported-by: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Reviewed-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>

R-B holds for v3.

Thanks,

Thomas


> ---
>   drivers/gpu/drm/xe/xe_bo.c | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 61789c0e88fb..cd043b1308ec 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1266,6 +1266,37 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>   	if (err)
>   		return ERR_PTR(err);
>   
> +	/*
> +	 * The VRAM pages underneath are potentially still being accessed by the
> +	 * GPU, as per async GPU clearing and async evictions. However TTM makes
> +	 * sure to add any corresponding move/clear fences into the objects
> +	 * dma-resv using the DMA_RESV_USAGE_KERNEL slot.
> +	 *
> +	 * For KMD internal buffers we don't care about GPU clearing, however we
> +	 * still need to handle async evictions, where the VRAM is still being
> +	 * accessed by the GPU. Most internal callers are not expecting this,
> +	 * since they are missing the required synchronisation before accessing
> +	 * the memory. To keep things simple just sync wait any kernel fences
> +	 * here, if the buffer is designated KMD internal.
> +	 *
> +	 * For normal userspace objects we should already have the required
> +	 * pipelining or sync waiting elsewhere, since we already have to deal
> +	 * with things like async GPU clearing.
> +	 */
> +	if (type == ttm_bo_type_kernel) {
> +		long timeout = dma_resv_wait_timeout(bo->ttm.base.resv,
> +						     DMA_RESV_USAGE_KERNEL,
> +						     ctx.interruptible,
> +						     MAX_SCHEDULE_TIMEOUT);
> +
> +		if (timeout < 0) {
> +			if (!resv)
> +				dma_resv_unlock(bo->ttm.base.resv);
> +			xe_bo_put(bo);
> +			return ERR_PTR(timeout);
> +		}
> +	}
> +
>   	bo->created = true;
>   	if (bulk)
>   		ttm_bo_set_bulk_move(&bo->ttm, bulk);


More information about the Intel-xe mailing list