<html dir="ltr"><head></head><body style="text-align:left; direction:ltr;"><div>On Fri, 2018-12-07 at 13:13 -0600, Jason Ekstrand wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Dec 4, 2018 at 1:18 AM Iago Toral Quiroga <<a href="mailto:itoral@igalia.com">itoral@igalia.com</a>> wrote:<br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">The implementation of these opcodes in the generator assumes that their<br> arguments are packed, and it generates register regions based on that<br> assumption. While this expectation is reasonable for 32-bit,<br></blockquote><div><br></div><div>Expectation, sure, but if someone does ddx(f2f32(d)) where d is a double, it's broken. Maybe we should back-port? Either way</div></div></div></blockquote><div><br></div><div>Yes, that's a good point. I'll send this separately to -stable.</div><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div> </div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"> when we<br> load 16-bit elements from UBOs we get them with a stride of 2 that we<br> then need to pack with a stride of 1. Copy propagation can see through this<br> and rewrite ddx/ddy operands to use the original, strided register, breaking<br> the implementation in the generator.<br> ---<br> .../compiler/brw_fs_copy_propagation.cpp | 21 +++++++++++++++++++<br> 1 file changed, 21 insertions(+)<br> <br> diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp b/src/intel/compiler/brw_fs_copy_propagation.cpp<br> index 58d5080b4e9..c01d4ec4a4f 100644<br> --- a/src/intel/compiler/brw_fs_copy_propagation.cpp<br> +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp<br> @@ -361,6 +361,20 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,<br> return true;<br> }<br> <br> +static bool<br> +instruction_requires_packed_data(fs_inst *inst)<br> +{<br> + switch (inst->opcode) {<br> + case FS_OPCODE_DDX_FINE:<br> + case FS_OPCODE_DDX_COARSE:<br> + case FS_OPCODE_DDY_FINE:<br> + case FS_OPCODE_DDY_COARSE:<br> + return true;<br> + default:<br> + return false;<br> + }<br> +}<br> +<br> bool<br> fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)<br> {<br> @@ -407,6 +421,13 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)<br> inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)<br> return false;<br> <br> + /* Some instructions implemented in the generator backend, such as<br> + * derivatives, assume that their operands are packed so we can't<br> + * generally propagate strided regions to them.<br> + */<br> + if (instruction_requires_packed_data(inst) && entry->src.stride > 1)<br> + return false;<br> +<br> /* Bail if the result of composing both strides would exceed the<br> * hardware limit.<br> */<br> </blockquote></div></div></blockquote></body></html>