[Intel-gfx] [PATCH v2 10/37] drm/i915/blt: support copying objects

Thu Jun 27 23:35:27 UTC 2019

Quoting Matthew Auld (2019-06-27 21:56:06)
> We can already clear an object with the blt, so try to do the same to
> support copying from one object backing store to another. Really this is
> just object -> object, which is not that useful yet, what we really want
> is two backing stores, but that will require some vma rework first,
> otherwise we are stuck with "tmp" objects.
> 
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Abdiel Janulgue <abdiel.janulgue at linux.intel.com
> ---
>  .../gpu/drm/i915/gem/i915_gem_object_blt.c    | 135 ++++++++++++++++++
>  .../gpu/drm/i915/gem/i915_gem_object_blt.h    |   8 ++
>  .../i915/gem/selftests/i915_gem_object_blt.c  | 105 ++++++++++++++
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |   3 +-
>  4 files changed, 250 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
> index cb42e3a312e2..c2b28e06c379 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
> @@ -102,6 +102,141 @@ int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj,
>         return err;
>  }
>  
> +int intel_emit_vma_copy_blt(struct i915_request *rq,
> +                           struct i915_vma *src,
> +                           struct i915_vma *dst)
> +{
> +       const int gen = INTEL_GEN(rq->i915);
> +       u32 *cs;
> +
> +       GEM_BUG_ON(src->size != dst->size);

For a low level interface, I would suggest a little over engineering and
take src_offset, dst_offset, length. For bonus points, 2D -- but I
accept that may be too much over-engineering without a user.

> +       cs = intel_ring_begin(rq, 10);
> +       if (IS_ERR(cs))
> +               return PTR_ERR(cs);
> +
> +       if (gen >= 9) {
> +               *cs++ = GEN9_XY_FAST_COPY_BLT_CMD | (10-2);
> +               *cs++ = BLT_DEPTH_32 | PAGE_SIZE;
> +               *cs++ = 0;
> +               *cs++ = src->size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> +               *cs++ = lower_32_bits(dst->node.start);
> +               *cs++ = upper_32_bits(dst->node.start);
> +               *cs++ = 0;
> +               *cs++ = PAGE_SIZE;
> +               *cs++ = lower_32_bits(src->node.start);
> +               *cs++ = upper_32_bits(src->node.start);

Reminds me that we didn't fix the earlier routines to handle more than
32k pages either. Please add a test case :)
-Chris