[igt-dev] [PATCH i-g-t v3]tests/i915/gem_userptr_blits: Added XY_FAST_COPY_BLT Command for MTL
Zbigniew Kempczyński
zbigniew.kempczynski at intel.com
Tue Nov 22 08:46:46 UTC 2022
On Mon, Nov 21, 2022 at 02:46:02PM -0800, Lucas De Marchi wrote:
> On Mon, Nov 21, 2022 at 10:32:57PM +0530, Vikas Srivastava wrote:
> > From: Arjun Melkaveri <arjun.melkaveri at intel.com>
> >
> > Test case uses legacy command which is not supported on MTL.
> > Modified test to use XY_FAST_COPY_BLT.
> >
> > Signed-off-by: Arjun Melkaveri <arjun.melkaveri at intel.com>
> > Acked-by: Priyanka Dandamudi <priyanka.dandamudi at intel.com>
>
> $ git grep XY_SRC_COPY_BLT_CMD
> lib/i830_reg.h:#define XY_SRC_COPY_BLT_CMD ((2<<29)|(0x53<<22))
> lib/intel_batchbuffer.c: dword0 |= XY_SRC_COPY_BLT_CMD;
> lib/intel_batchbuffer.c: intel_bb_out(ibb, XY_SRC_COPY_BLT_CMD |
> lib/intel_batchbuffer.h: OUT_BATCH(XY_SRC_COPY_BLT_CMD | \
I would like to get rid of all libdrm stuff in intel_batchbuffer but there's
one test which still references to it - prime_nv_test.c.
+ Petri, Kamil
May I remove this test? There're no users of it (there's no such hw we can
test this on) so I see no reason to keep outdated and dead test.
Anyway, I'm going to send the series, please decide in the meantime to
accept it or reject.
> lib/intel_reg.h:#define XY_SRC_COPY_BLT_CMD ((2<<29)|(0x53<<22))
> tests/i915/api_intel_allocator.c: batch[i++] = XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_blits.c: batch[i] = (XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_blits.c: batch[i] = (XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_blits.c: batch[i] = (XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_caching.c: XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_evict_alignment.c: batch[i++] = (XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_evict_everything.c: batch[i++] = (XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_linear_blits.c: batch[i++] = XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_partial_pwrite_pread.c: XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_tiled_fence_blits.c: batch[i] = (XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_userptr_blits.c: batch[i++] = XY_SRC_COPY_BLT_CMD |
> tests/i915/gem_userptr_blits.c: batch[i++] = XY_SRC_COPY_BLT_CMD |
> tests/i915/gen3_mixed_blits.c: *b++ = (XY_SRC_COPY_BLT_CMD |
>
> Currently we have one abstraction for bb creation, but some tests
> not using it. This is covering just 1 test... maybe we should rather
> convert tests to use a common abstraction and change it there?
>
> +Zbigniew, who I think was working on some common abstraction.
+ Karolina
TL;DR
------
We're working on more universal blitter library, but currently it
supports/will support xy-block-copy-blt and xy-fast-copy-blt. So any
above changes (xy-src-copy-blt -> xy-fast-copy-blt) I still would do
manually.
Long:
-----
Question is - do we really want to move to intel-bb everywhere. Intel-bb
was create to remove libdrm in render-copy/gpgpu-fill/media-fill. Nothing
prevents us from creating batches this way, but this still doesn't
handle conditional code to feed batches. I think we should rather go way
where we prepare data in structures, pass to build/execute batch (see
gem_ccs/i915_blt for reference). We may integrate this with intel-bb too
(I mean feeding intel-bb with instructions produced by the blit library
function is easy). Currently this is standalone blit (with execbuf
itself), but I want to change this - current design doesn't allow to
execute two blits in a row what may reveal some problems (cache flushes,
etc.). I opt for structures because new blit commands are quite big
(xy-block-copy-blt on dg2 has 21 dwords - I don't want to make function
argument list monsters - readability of this is questionable thing).
Another thing we're considering are tilings and compression support in
specific platform. I was surprised that in same gen we have different
tilings/compressions (dg1 and dg2 as an example). If I'm not wrong
main bcs engine also may differ from other engines...
--
Zbigniew
>
>
> Lucas De Marchi
>
>
>
> > ---
> > tests/i915/gem_userptr_blits.c | 129 +++++++++++++++++++++------------
> > 1 file changed, 81 insertions(+), 48 deletions(-)
> >
> > diff --git a/tests/i915/gem_userptr_blits.c b/tests/i915/gem_userptr_blits.c
> > index 698508669..edcc3418b 100644
> > --- a/tests/i915/gem_userptr_blits.c
> > +++ b/tests/i915/gem_userptr_blits.c
> > @@ -99,8 +99,9 @@ static int copy(int fd, uint32_t dst, uint32_t src)
> > struct drm_i915_gem_relocation_entry reloc[2];
> > struct drm_i915_gem_exec_object2 obj[3];
> > struct drm_i915_gem_execbuffer2 exec;
> > + static uint32_t devid;
> > uint32_t handle;
> > - int ret, i=0;
> > + int ret, i = 0;
> > uint64_t dst_offset, src_offset, bb_offset;
> > bool has_relocs = gem_has_relocations(fd);
> >
> > @@ -108,29 +109,44 @@ static int copy(int fd, uint32_t dst, uint32_t src)
> > dst_offset = bb_offset + 4096;
> > src_offset = dst_offset + WIDTH * HEIGHT * sizeof(uint32_t) * (src != dst);
> >
> > - batch[i++] = XY_SRC_COPY_BLT_CMD |
> > - XY_SRC_COPY_BLT_WRITE_ALPHA |
> > - XY_SRC_COPY_BLT_WRITE_RGB;
> > - if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > - batch[i - 1] |= 8;
> > - else
> > - batch[i - 1] |= 6;
> > -
> > - batch[i++] = (3 << 24) | /* 32 bits */
> > - (0xcc << 16) | /* copy ROP */
> > - WIDTH*4;
> > - batch[i++] = 0; /* dst x1,y1 */
> > - batch[i++] = (HEIGHT << 16) | WIDTH; /* dst x2,y2 */
> > - batch[i++] = dst_offset; /* dst reloc */
> > - if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > - batch[i++] = dst_offset >> 32;
> > - batch[i++] = 0; /* src x1,y1 */
> > - batch[i++] = WIDTH*4;
> > - batch[i++] = src_offset; /* src reloc */
> > - if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > - batch[i++] = src_offset >> 32;
> > - batch[i++] = MI_BATCH_BUFFER_END;
> > - batch[i++] = MI_NOOP;
> > + devid = intel_get_drm_devid(fd);
> > +
> > + if (intel_graphics_ver(devid) >= IP_VER(12, 60)) {
> > + batch[i++] = XY_FAST_COPY_BLT;
> > + batch[i++] = XY_FAST_COPY_COLOR_DEPTH_32 | WIDTH*4;
> > + batch[i++] = 0;/* dst x1,y1 */
> > + batch[i++] = (HEIGHT << 16) | WIDTH;/* dst x2,y2 */
> > + batch[i++] = dst_offset;/* dst address */
> > + batch[i++] = 0;/* src x1,y1 */
> > + batch[i++] = WIDTH*4;/* src pitch */
> > + batch[i++] = src_offset;/* src address */
> > + batch[i++] = MI_BATCH_BUFFER_END;
> > + batch[i++] = MI_NOOP;
> > + } else {
> > + batch[i++] = XY_SRC_COPY_BLT_CMD |
> > + XY_SRC_COPY_BLT_WRITE_ALPHA |
> > + XY_SRC_COPY_BLT_WRITE_RGB;
> > + if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > + batch[i - 1] |= 8;
> > + else
> > + batch[i - 1] |= 6;
> > +
> > + batch[i++] = (3 << 24) | /* 32 bits */
> > + (0xcc << 16) | /* copy ROP */
> > + WIDTH*4;
> > + batch[i++] = 0; /* dst x1,y1 */
> > + batch[i++] = (HEIGHT << 16) | WIDTH; /* dst x2,y2 */
> > + batch[i++] = dst_offset; /* dst reloc */
> > + if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > + batch[i++] = dst_offset >> 32;
> > + batch[i++] = 0; /* src x1,y1 */
> > + batch[i++] = WIDTH*4;
> > + batch[i++] = src_offset; /* src reloc */
> > + if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > + batch[i++] = src_offset >> 32;
> > + batch[i++] = MI_BATCH_BUFFER_END;
> > + batch[i++] = MI_NOOP;
> > + }
> >
> > handle = gem_create(fd, 4096);
> > gem_write(fd, handle, 0, batch, sizeof(batch));
> > @@ -204,31 +220,48 @@ blit(int fd, uint32_t dst, uint32_t src, uint32_t *all_bo, int n_bo)
> > struct drm_i915_gem_relocation_entry reloc[2];
> > struct drm_i915_gem_exec_object2 *obj;
> > struct drm_i915_gem_execbuffer2 exec;
> > - uint32_t handle;
> > - int n, ret, i=0;
> > + static uint32_t devid;
> >
> > - batch[i++] = XY_SRC_COPY_BLT_CMD |
> > - XY_SRC_COPY_BLT_WRITE_ALPHA |
> > - XY_SRC_COPY_BLT_WRITE_RGB;
> > - if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > - batch[i - 1] |= 8;
> > - else
> > - batch[i - 1] |= 6;
> > - batch[i++] = (3 << 24) | /* 32 bits */
> > - (0xcc << 16) | /* copy ROP */
> > - WIDTH*4;
> > - batch[i++] = 0; /* dst x1,y1 */
> > - batch[i++] = (HEIGHT << 16) | WIDTH; /* dst x2,y2 */
> > - batch[i++] = 0; /* dst reloc */
> > - if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > - batch[i++] = 0;
> > - batch[i++] = 0; /* src x1,y1 */
> > - batch[i++] = WIDTH*4;
> > - batch[i++] = 0; /* src reloc */
> > - if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > - batch[i++] = 0;
> > - batch[i++] = MI_BATCH_BUFFER_END;
> > - batch[i++] = MI_NOOP;
> > + uint32_t handle;
> > + int n, ret, i = 0;
> > +
> > + devid = intel_get_drm_devid(fd);
> > +
> > + if (intel_graphics_ver(devid) >= IP_VER(12, 60)) {
> > + batch[i++] = XY_FAST_COPY_BLT;
> > + batch[i++] = XY_FAST_COPY_COLOR_DEPTH_32 | WIDTH*4;
> > + batch[i++] = 0;/* dst x1,y1 */
> > + batch[i++] = (HEIGHT << 16) | WIDTH;/* dst x2,y2 */
> > + batch[i++] = ;/* dst address */
> > + batch[i++] = 0;/* src x1,y1 */
> > + batch[i++] = WIDTH*4;/* src pitch */
> > + batch[i++] = 0;// src address lower bits
> > + batch[i++] = MI_BATCH_BUFFER_END;
> > + batch[i++] = MI_NOOP;
> > + } else {
> > + batch[i++] = XY_SRC_COPY_BLT_CMD |
> > + XY_SRC_COPY_BLT_WRITE_ALPHA |
> > + XY_SRC_COPY_BLT_WRITE_RGB;
> > + if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > + batch[i - 1] |= 8;
> > + else
> > + batch[i - 1] |= 6;
> > + batch[i++] = (3 << 24) | /* 32 bits */
> > + (0xcc << 16) | /* copy ROP */
> > + WIDTH*4;
> > + batch[i++] = 0; /* dst x1,y1 */
> > + batch[i++] = (HEIGHT << 16) | WIDTH; /* dst x2,y2 */
> > + batch[i++] = 0; /* dst reloc */
> > + if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > + batch[i++] = 0;
> > + batch[i++] = 0; /* src x1,y1 */
> > + batch[i++] = WIDTH*4;
> > + batch[i++] = 0; /* src reloc */
> > + if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > + batch[i++] = 0;
> > + batch[i++] = MI_BATCH_BUFFER_END;
> > + batch[i++] = MI_NOOP;
> > + }
> >
> > handle = gem_create(fd, 4096);
> > gem_write(fd, handle, 0, batch, sizeof(batch));
> > --
> > 2.25.1
> >
More information about the igt-dev
mailing list