<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Apr 27, 2017 at 11:32 AM, Nanley Chery <span dir="ltr"><<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We're now performing a GPU memcpy in more places to copy small amounts<br>
of data. Add a path to thrash less state.<br>
<br>
Signed-off-by: Nanley Chery <<a href="mailto:nanley.g.chery@intel.com">nanley.g.chery@intel.com</a>><br>
---<br>
src/intel/vulkan/genX_gpu_<wbr>memcpy.c | 38 ++++++++++++++++++++++++++++++<wbr>++++++++<br>
1 file changed, 38 insertions(+)<br>
<br>
diff --git a/src/intel/vulkan/genX_gpu_<wbr>memcpy.c b/src/intel/vulkan/genX_gpu_<wbr>memcpy.c<br>
index 3cbc7235cf..f15c2a5f72 100644<br>
--- a/src/intel/vulkan/genX_gpu_<wbr>memcpy.c<br>
+++ b/src/intel/vulkan/genX_gpu_<wbr>memcpy.c<br>
@@ -28,6 +28,8 @@<br>
<br>
#include "common/gen_l3_config.h"<br>
<br>
+#define MI_PREDICATE_SRC0 0x2400<br>
+<br>
/**<br>
* This file implements some lightweight memcpy/memset operations on the GPU<br>
* using a vertex buffer and streamout.<br>
@@ -63,6 +65,42 @@ genX(cmd_buffer_gpu_memcpy)(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
assert(dst_offset + size <= dst->size);<br>
assert(src_offset + size <= src->size);<br>
<br>
+ /* This memcpy expects DWord aligned memory. */<br>
+ assert(size % 4 == 0);<br>
+ assert(dst_offset % 4 == 0);<br>
+ assert(src_offset % 4 == 0);<br>
+<br>
+ /* Use a simpler memcpy operation when copying 16 bytes or less of data.<br>
+ * This is the size of a surface state's clear value on SKL+.<br>
+ */<br></blockquote><div><br></div><div>I think I would rather just have a separate function. Why? Because these two methods have very different characteristics in terms of what state they trash (quite a bit vs. none) and how they perform. I'd rather we be explicit about which method we use. Feel free to rename cmd_buffer_gpu_memcpy to cmd_buffer_streamout_copy and then you can name the other cmd_buffer_mem_mem_copy or similar.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+ if (size <= 16) {<br>
+ for (uint32_t i = 0; i < size; i += 4) {<br>
+ const struct anv_address src_addr =<br>
+ (struct anv_address) { src, src_offset + i};<br>
+ const struct anv_address dst_addr =<br>
+ (struct anv_address) { dst, dst_offset + i};<br>
+#if GEN_GEN >= 8<br>
+ anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_COPY_MEM_MEM), cp) {<br>
+ cp.DestinationMemoryAddress = dst_addr;<br>
+ cp.SourceMemoryAddress = src_addr;<br>
+ }<br>
+#else<br>
+ /* IVB does not have a general purpose register for command streamer<br>
+ * commands. Therefore, we use an alternate temporary register.<br>
+ */<br>
+ anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), load) {<br>
+ load.RegisterAddress = MI_PREDICATE_SRC0;<br>
+ load.MemoryAddress = src_addr;<br>
+ }<br>
+ anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_REGISTER_MEM), store) {<br>
+ store.RegisterAddress = MI_PREDICATE_SRC0;<br>
+ store.MemoryAddress = dst_addr;<br>
+ }<br>
+#endif<br>
+ }<br>
+ return;<br>
+ }<br>
+<br>
/* The maximum copy block size is 4 32-bit components at a time. */<br>
unsigned bs = 16;<br>
bs = gcd_pow2_u64(bs, src_offset);<br>
<span class="HOEnZb"><font color="#888888">--<br>
2.12.2<br>
<br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</font></span></blockquote></div><br></div></div>