<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Dec 4, 2018 at 1:19 AM Iago Toral Quiroga <<a href="mailto:itoral@igalia.com">itoral@igalia.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">We use ALign16 mode for this, since it is more convenient, but the PRM<br> for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region<br> restrictions', Section '1. Special Restrictions':<br> <br> "In Align16 mode, the channel selects and channel enables apply to a<br> pair of half-floats, because these parameters are defined for DWord<br> elements ONLY. This is applicable when both source and destination<br> are half-floats."<br> <br> This means that we cannot select individual HF elements using swizzles<br> like we do with 32-bit floats so we can't implement the required<br> regioning for this.<br> <br> Use the gen11 path for this instead, which uses Align1 mode.<br> <br> The restriction is not present in gen9 of gen10, where the Align16<br></blockquote><div><br></div><div>"or gen10"?</div><div><br></div><div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> implementation seems to work just fine.<br> ---<br> src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--<br> 1 file changed, 8 insertions(+), 2 deletions(-)<br> <br> diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp<br> index d8e4bae17e0..ba7ed07e692 100644<br> --- a/src/intel/compiler/brw_fs_generator.cpp<br> +++ b/src/intel/compiler/brw_fs_generator.cpp<br> @@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst *inst,<br> const uint32_t type_size = type_sz(src.type);<br> <br> if (inst->opcode == FS_OPCODE_DDY_FINE) {<br> - /* produce accurate derivatives */<br> - if (devinfo->gen >= 11) {<br> + /* produce accurate derivatives. We can do this easily in Align16<br> + * but this is not supported in gen11+ and gen8 Align16 swizzles<br> + * for Half-Float operands work in units of 32-bit and always<br> + * select pairs of consecutive half-float elements, so we can't use<br> + * use it for this.<br> + */<br> + if (devinfo->gen >= 11 ||<br> + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {<br> src = stride(src, 0, 2, 1);<br> struct brw_reg src_0 = byte_offset(src, 0 * type_size);<br> struct brw_reg src_2 = byte_offset(src, 2 * type_size);<br> -- <br> 2.17.1<br> <br> _______________________________________________<br> mesa-dev mailing list<br> <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br> </blockquote></div></div>