<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Apr 27, 2017 at 11:32 AM, Nanley Chery <<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We're now performing a GPU memcpy in more places to copy small amounts of data. Add a path to thrash less state. Signed-off-by: Nanley Chery <<a href="mailto:nanley.g.chery@intel.com">nanley.g.chery@intel.com</a>> --- src/intel/vulkan/genX_gpu_memcpy.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/src/intel/vulkan/genX_gpu_memcpy.c b/src/intel/vulkan/genX_gpu_memcpy.c index 3cbc7235cf..f15c2a5f72 100644 --- a/src/intel/vulkan/genX_gpu_memcpy.c +++ b/src/intel/vulkan/genX_gpu_memcpy.c @@ -28,6 +28,8 @@ #include "common/gen_l3_config.h" +#define MI_PREDICATE_SRC0 0x2400 + /** * This file implements some lightweight memcpy/memset operations on the GPU * using a vertex buffer and streamout. @@ -63,6 +65,42 @@ genX(cmd_buffer_gpu_memcpy)(struct anv_cmd_buffer *cmd_buffer, assert(dst_offset + size <= dst->size); assert(src_offset + size <= src->size); + /* This memcpy expects DWord aligned memory. */ + assert(size % 4 == 0); + assert(dst_offset % 4 == 0); + assert(src_offset % 4 == 0); + + /* Use a simpler memcpy operation when copying 16 bytes or less of data. + * This is the size of a surface state's clear value on SKL+. + */ </blockquote><div> </div><div>I think I would rather just have a separate function. Why? Because these two methods have very different characteristics in terms of what state they trash (quite a bit vs. none) and how they perform. I'd rather we be explicit about which method we use. Feel free to rename cmd_buffer_gpu_memcpy to cmd_buffer_streamout_copy and then you can name the other cmd_buffer_mem_mem_copy or similar. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + if (size <= 16) { + for (uint32_t i = 0; i < size; i += 4) { + const struct anv_address src_addr = + (struct anv_address) { src, src_offset + i}; + const struct anv_address dst_addr = + (struct anv_address) { dst, dst_offset + i}; +#if GEN_GEN >= 8 + anv_batch_emit(&cmd_buffer->batch, GENX(MI_COPY_MEM_MEM), cp) { + cp.DestinationMemoryAddress = dst_addr; + cp.SourceMemoryAddress = src_addr; + } +#else + /* IVB does not have a general purpose register for command streamer + * commands. Therefore, we use an alternate temporary register. + */ + anv_batch_emit(&cmd_buffer->batch, GENX(MI_LOAD_REGISTER_MEM), load) { + load.RegisterAddress = MI_PREDICATE_SRC0; + load.MemoryAddress = src_addr; + } + anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_REGISTER_MEM), store) { + store.RegisterAddress = MI_PREDICATE_SRC0; + store.MemoryAddress = dst_addr; + } +#endif + } + return; + } + /* The maximum copy block size is 4 32-bit components at a time. */ unsigned bs = 16; bs = gcd_pow2_u64(bs, src_offset); -- 2.12.2 _______________________________________________ mesa-dev mailing list <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> </blockquote></div> </div></div>