[Intel-xe] [PATCH v3 3/3] drm/xe/bo: sync kernel fences for KMD buffers
Thomas Hellström
thomas.hellstrom at linux.intel.com
Mon Oct 30 16:13:52 UTC 2023
On 10/30/23 17:10, Matthew Auld wrote:
> With things like pipelined evictions, VRAM pages can be marked as free
> and yet still have some active kernel fences, with the idea that the
> next caller to allocate the memory will respect them. However it looks
> like we are missing synchronisation for KMD internal buffers, like
> page-tables, lrc etc. For userspace objects we should already have the
> required synchronisation for CPU access via the fault handler, and
> likewise for GPU access when vm_binding them.
>
> To fix this synchronise against any kernel fences for all KMD objects at
> creation. This should resolve some severe corruption seen during
> evictions.
>
> v2 (Matt B):
> - Revamp the comment explaining this. Also mention why USAGE_KERNEL is
> correct here.
> v3 (Thomas):
> - Make sure to use ctx.interruptible for the wait.
>
> Closes: ?
> Testcase: igt at xe-evict-ccs
> Reported-by: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Reviewed-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
R-B holds for v3.
Thanks,
Thomas
> ---
> drivers/gpu/drm/xe/xe_bo.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 61789c0e88fb..cd043b1308ec 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1266,6 +1266,37 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> if (err)
> return ERR_PTR(err);
>
> + /*
> + * The VRAM pages underneath are potentially still being accessed by the
> + * GPU, as per async GPU clearing and async evictions. However TTM makes
> + * sure to add any corresponding move/clear fences into the objects
> + * dma-resv using the DMA_RESV_USAGE_KERNEL slot.
> + *
> + * For KMD internal buffers we don't care about GPU clearing, however we
> + * still need to handle async evictions, where the VRAM is still being
> + * accessed by the GPU. Most internal callers are not expecting this,
> + * since they are missing the required synchronisation before accessing
> + * the memory. To keep things simple just sync wait any kernel fences
> + * here, if the buffer is designated KMD internal.
> + *
> + * For normal userspace objects we should already have the required
> + * pipelining or sync waiting elsewhere, since we already have to deal
> + * with things like async GPU clearing.
> + */
> + if (type == ttm_bo_type_kernel) {
> + long timeout = dma_resv_wait_timeout(bo->ttm.base.resv,
> + DMA_RESV_USAGE_KERNEL,
> + ctx.interruptible,
> + MAX_SCHEDULE_TIMEOUT);
> +
> + if (timeout < 0) {
> + if (!resv)
> + dma_resv_unlock(bo->ttm.base.resv);
> + xe_bo_put(bo);
> + return ERR_PTR(timeout);
> + }
> + }
> +
> bo->created = true;
> if (bulk)
> ttm_bo_set_bulk_move(&bo->ttm, bulk);
More information about the Intel-xe
mailing list