<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Aug 24, 2017 at 6:54 AM, Alejandro Piñeiro <span dir="ltr"><<a href="mailto:apinheiro@igalia.com" target="_blank">apinheiro@igalia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">From: Jose Maria Casanova Crespo <<a href="mailto:jmcasanova@igalia.com">jmcasanova@igalia.com</a>><br>
<br>
>From Skylake PRM, Volume 07 3D Media GPGPU, Section Register Region<br>
Restrictions, SubSection 5. Special Cases for Word Operations (page<br>
781):<br>
<br>
"There is a relaxed alignment rule for word destinations. When the<br>
destination type is word (UW, W, HF), destination data types can<br>
be aligned to either the lowest word or the second lowest word of<br>
the execution channel. This means the destination data words can<br>
be either all in the even word locations or all in the odd word<br>
locations.<br>
<br>
// Example:<br>
add (8) r10.0<2>:hf r11.0<8;8,1>:f r12.0<8;8,1>:hf<br>
add (8) r10.1<2>:hf r11.0<8;8,1>:f r12.0<8;8,1>:hf<br>
// Note: The destination offset may be .0 or .1 although the destination<br>
subregister<br>
// is required to be aligned to execution datatype."<br>
<br>
In practice, that means that for alu operations like float to half<br>
float conversions, we need to align the 16-bit to 32-bit (so setting<br>
the stride to 2).<br></blockquote><div><br></div><div>I believe your application of this rule is a bit imprecise. My understanding of the hardware is that the rule stated above is a modification to the following rule from the previous page:</div><div><br></div><div>> When the Execution Data Type is wider than the destination data type, the destination must<br>> be aligned as required by the wider execution data type and specify a HorzStride equal to<br>> the ratio in sizes of the two data types. For example, a mov with a D source and B destination<br>> must use a 4-byte aligned destination and a Dst.HorzStride of 4.</div><div><br></div><div>In other words, the rule you stated above only applies in the case of ALU operations with a source type wider than the destination. If all operands to an ALU operation are word types, the destination can be tightly packed. We have a decent bit of code in the driver today which does ALU operations that write to word destinations with a stride of 1; all of it is done with W or DW and not HF but I believe the same principal applies to HF.<br></div><div><br></div><div>In particular, this means that basically the only time we care about this rule is for nir_op_f2f16 and f2f32 (with a 64-bit source). The rest of the ALU operations should be fine as-is. (For nir_op_f2f32 with a 16-bit source, the destination is wider than the source so the restriction does not apply.) I'd prefer we let 16-bit values be tightly packed by default and implement f32->f16 conversions as</div><div><br></div><div>mov (8) r10.0<2>:hf r11.0<8;8,1>f</div><div>mov (8) r12.0<1>:hf r10.0<16;8,2>hf</div><div><br></div><div>The optimizer can probably sort things out for us and get rid of the second mov.</div><div><br></div><div>--Jason<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
As a first approach, we are setting the stride to 2 in general, even<br>
if it is not an alu operation, as simplifies handling with 16-bit data<br>
types. We can revisit this in the future.<br>
<br>
Signed-off-by: Jose Maria Casanova Crespo <<a href="mailto:jmcasanova@igalia.com">jmcasanova@igalia.com</a>><br>
Signed-off-by: Alejandro Piñeiro <<a href="mailto:apinheiro@igalia.com">apinheiro@igalia.com</a>><br>
---<br>
src/intel/compiler/brw_fs_nir.<wbr>cpp | 35 ++++++++++++++++++++++++++++++<wbr>++++-<br>
1 file changed, 34 insertions(+), 1 deletion(-)<br>
<br>
diff --git a/src/intel/compiler/brw_fs_<wbr>nir.cpp b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
index e4eba1401f8..c469baf42a1 100644<br>
--- a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
+++ b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
@@ -585,6 +585,21 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)<br>
result.type = brw_type_for_nir_type(devinfo,<br>
(nir_alu_type)(nir_op_infos[<wbr>instr->op].output_type |<br>
nir_dest_bit_size(instr->dest.<wbr>dest)));<br>
+ /* According to PRM, Volume 07 3D Media GPGPU, Section Register Region<br>
+ * Restrictions, SubSection 5. Special Cases for Word Operations, GEN >= 8<br>
+ * can only handle writting on 16-bit (UW,HF,W) registers on the lower<br>
+ * word or in the higher word for each double word. It needs to be the<br>
+ * same offset for for all 16-bit elements of the register. So as<br>
+ * consequence all destinations of 16-bit registers should use a stride<br>
+ * equal to 2.<br>
+ *<br>
+ * We also need alignment to 32-bit for conversions operations<br>
+ * form 32-bit to 16-bit. So this is also useful for this operations<br>
+ * that are the only alu operation supported right now.<br>
+ */<br>
+<br>
+ if (nir_dest_bit_size(instr-><wbr>dest.dest) == 16)<br>
+ result.stride = 2;<br>
<br>
fs_reg op[4];<br>
for (unsigned i = 0; i < nir_op_infos[instr->op].num_<wbr>inputs; i++) {<br>
@@ -594,6 +609,9 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)<br>
nir_src_bit_size(instr->src[i]<wbr>.src)));<br>
op[i].abs = instr->src[i].abs;<br>
op[i].negate = instr->src[i].negate;<br>
+ /* Previous comment still aplies here for source elements */<br>
+ if (nir_src_bit_size(instr->src[<wbr>i].src) == 16)<br>
+ op[i].stride = 2;<br>
}<br>
<br>
/* We get a bunch of mov's out of the from_ssa pass and they may still<br>
@@ -2083,6 +2101,9 @@ fs_visitor::emit_gs_input_<wbr>load(const fs_reg &dst,<br>
first_component = first_component / 2;<br>
}<br>
<br>
+ if (type_sz(tmp_dst.type) == 2)<br>
+ tmp_dst.stride = 2;<br>
+<br>
for (unsigned iter = 0; iter < num_iterations; iter++) {<br>
if (offset_const) {<br>
/* Constant indexing - use global offset. */<br>
@@ -2414,6 +2435,9 @@ fs_visitor::nir_emit_tcs_<wbr>intrinsic(const fs_builder &bld,<br>
dst = tmp;<br>
}<br>
<br>
+ if (type_sz(dst.type) == 2)<br>
+ dst.stride = 2;<br>
+<br>
for (unsigned iter = 0; iter < num_iterations; iter++) {<br>
if (indirect_offset.file == BAD_FILE) {<br>
/* Constant indexing - use global offset. */<br>
@@ -2725,6 +2749,9 @@ fs_visitor::nir_emit_tes_<wbr>intrinsic(const fs_builder &bld,<br>
first_component = first_component / 2;<br>
}<br>
<br>
+ if (type_sz(dest.type) == 2)<br>
+ dest.stride = 2;<br>
+<br>
fs_inst *inst;<br>
if (indirect_offset.file == BAD_FILE) {<br>
/* Arbitrarily only push up to 32 vec4 slots worth of data,<br>
@@ -2735,7 +2762,7 @@ fs_visitor::nir_emit_tes_<wbr>intrinsic(const fs_builder &bld,<br>
fs_reg src = fs_reg(ATTR, imm_offset / 2, dest.type);<br>
for (int i = 0; i < instr->num_components; i++) {<br>
unsigned comp = 16 / type_sz(dest.type) * (imm_offset % 2) +<br>
- i + first_component;<br>
+ i * dest.stride + first_component;<br>
bld.MOV(offset(dest, bld, i), component(src, comp));<br>
}<br>
tes_prog_data->base.urb_read_<wbr>length =<br>
@@ -3182,6 +3209,9 @@ fs_visitor::nir_emit_fs_<wbr>intrinsic(const fs_builder &bld,<br>
num_components *= 2;<br>
}<br>
<br>
+ if (type_sz(dest.type) == 2)<br>
+ dest.stride = 2;<br>
+<br>
for (unsigned int i = 0; i < num_components; i++) {<br>
struct brw_reg interp = interp_reg(base, component + i);<br>
interp = suboffset(interp, 3);<br>
@@ -3576,6 +3606,9 @@ fs_visitor::nir_emit_<wbr>intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr<br>
if (nir_intrinsic_infos[instr-><wbr>intrinsic].has_dest)<br>
dest = get_nir_dest(instr->dest);<br>
<br>
+ if (instr->dest.is_ssa && nir_dest_bit_size(instr->dest) == 16)<br>
+ dest.stride = 2;<br>
+<br>
switch (instr->intrinsic) {<br>
case nir_intrinsic_atomic_counter_<wbr>inc:<br>
case nir_intrinsic_atomic_counter_<wbr>dec:<br>
<span class="gmail-HOEnZb"><font color="#888888">--<br>
2.11.0<br>
<br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</font></span></blockquote></div><br></div></div>