<html><head></head><body><div>On Thu, 2017-01-05 at 06:41 -0800, Jason Ekstrand wrote:</div><blockquote type="cite"><div dir="auto"><div class="gmail_extra" dir="auto"><div class="gmail_quote">On Jan 5, 2017 3:11 AM, "Juan A. Suarez Romero" <<a href="mailto:jasuarez@igalia.com">jasuarez@igalia.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="elided-text" dir="auto"><div>On Wed, 2017-01-04 at 07:06 -0800, Jason Ekstrand wrote:</div><blockquote type="cite"><div dir="auto"><div><div class="gmail_extra"><div class="gmail_quote">On Jan 4, 2017 5:46 AM, "Juan A. Suarez Romero" <<a href="mailto:jasuarez@igalia.com" target="_blank">jasuarez@igalia.com</a>> wrote:<br type="attribution"><blockquote class="m_-3362162493489463209quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="m_-3362162493489463209quoted-text"><div>On Tue, 2017-01-03 at 14:41 -0800, Jason Ekstrand wrote:</div></div><div class="m_-3362162493489463209elided-text"><blockquote type="cite"><div dir="ltr"><div dir="auto"><div><div class="gmail_extra"><div class="gmail_quote">I made a few pretty trivial comments. With those addressed,<br><br></div><div class="gmail_quote">Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>><br></div><div class="gmail_quote" dir="auto"><br></div><div class="gmail_quote">On Dec 16, 2016 8:55 AM, "Juan A. Suarez Romero" <<a href="mailto:jasuarez@igalia.com" target="_blank">jasuarez@igalia.com</a>> wrote:<br type="attribution"><blockquote class="m_-3362162493489463209m_8367481713989651032m_6457694654742999625quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">So far, input_reads was a bitmap tracking which vertex input locations<br> were being used.<br> <br> In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)<br> consumes just one location, any other small attribute. So we mark the<br> proper bit in inputs_read, and also the same bit in double_inputs_read<br> if the attribute is a dvec3/dvec4.<br> <br> But in Vulkan, this is slightly different: a dvec3/dvec4 attribute<br> consumes two locations, not just one. And hence two bits would be marked<br> in inputs_read for the same vertex input attribute.<br> <br> To avoid handling two different situations in NIR, we just choose the<br> latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4<br> vertex input attribute is marked with two bits in the inputs_read bitmap<br> (and also in the double_inputs_read), and following attributes are<br> adjusted accordingly.<br> <br> As example, if in our GLSL/IR shader we have three attributes:<br> <br> layout(location = 0) vec3 attr0;<br> layout(location = 1) dvec4 attr1;<br> layout(location = 2) dvec3 attr2;<br> <br> then in our NIR shader we put attr0 in location 0, attr1 in locations 1<br> and 2, and attr2 in location 3.<br></blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">attr2 goes in locations 3 *and* 4, correct?</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="m_-3362162493489463209m_8367481713989651032m_6457694654742999625quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Checking carefully, basically we are using slots rather than locations<br> in NIR.<br> <br> When emitting the vertices, we do a inverse map to know the<br> corresponding location for each slot.<br> <br> v2 (Jason):<br> - use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.<br> ---<br> src/compiler/glsl/glsl_to_nir<wbr>.cpp | 28 +++++++++++++<br> src/compiler/nir/nir_gather_i<wbr>nfo.c | 48 ++++++++++-----------<br> src/intel/vulkan/genX_pipelin<wbr>e.c | 63 +++++++++++++++++-----------<br> src/mesa/drivers/dri/i965/brw<wbr>_draw_upload.c | 11 +++--<br> src/mesa/drivers/dri/i965/brw<wbr>_fs.cpp | 13 ------<br> src/mesa/drivers/dri/i965/brw<wbr>_fs_visitor.cpp | 3 +-<br> src/mesa/drivers/dri/i965/brw<wbr>_nir.c | 6 +--<br> src/mesa/drivers/dri/i965/brw<wbr>_nir.h | 1 -<br> src/mesa/drivers/dri/i965/brw<wbr>_vec4.cpp | 11 +++--<br> 9 files changed, 106 insertions(+), 78 deletions(-)<br> <br> diff --git a/src/compiler/glsl/glsl_to_ni<wbr>r.cpp b/src/compiler/glsl/glsl_to_ni<wbr>r.cpp<br> index 4debc37..0814dad 100644<br> --- a/src/compiler/glsl/glsl_to_ni<wbr>r.cpp<br> +++ b/src/compiler/glsl/glsl_to_ni<wbr>r.cpp<br> @@ -129,6 +129,19 @@ private:<br> <br> } /* end of anonymous namespace */<br> <br> +static void<br> +nir_remap_attributes(nir_shad<wbr>er *shader)<br> +{<br> + nir_foreach_variable(var, &shader->inputs) {<br> + var->data.location += _mesa_bitcount_64(shader->info<wbr>->double_inputs_read &<br> + BITFIELD64_MASK(var->data.loca<wbr>tion));<br> + }<br> +<br> + /* Once the remap is done, reset double_inputs_read, so later it will have<br> + * which location/slots are doubles */<br> + shader->info->double_inputs_r<wbr>ead = 0;<br> +}<br> +<br> nir_shader *<br> glsl_to_nir(const struct gl_shader_program *shader_prog,<br> gl_shader_stage stage,<br> @@ -146,6 +159,13 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,<br> <br> nir_lower_constant_initializer<wbr>s(shader, (nir_variable_mode)~0);<br> <br> + /* Remap the locations to slots so those requiring two slots will occupy<br> + * two locations. For instance, if we have in the IR code a dvec3 attr0 in<br> + * location 0 and vec4 attr1 in location 1, in NIR attr0 will use<br> + * locations/slots 0 and 1, and attr1 will use location/slot 2 */<br> + if (shader->stage == MESA_SHADER_VERTEX)<br> + nir_remap_attributes(shader);<br> +<br> shader->info->name = ralloc_asprintf(shader, "GLSL%d", shader_prog->Name);<br> if (shader_prog->Label)<br> shader->info->label = ralloc_strdup(shader, shader_prog->Label);<br> @@ -315,6 +335,14 @@ nir_visitor::visit(ir_variable *ir)<br> } else {<br> var->data.mode = nir_var_shader_in;<br> }<br> +<br> + /* Mark all the locations that require two slots */<br> + if (glsl_type_is_dual_slot(glsl_w<wbr>ithout_array(var->type))) {<br> + for (uint i = 0; i < glsl_count_attribute_slots(var<wbr>->type, true); i++) {<br> + uint64_t bitfield = BITFIELD64_BIT(var->data.locat<wbr>ion + i);<br> + shader->info->double_inputs_re<wbr>ad |= bitfield;<br> + }<br> + }<br> break;<br> <br> case ir_var_shader_out:<br> diff --git a/src/compiler/nir/nir_gather_<wbr>info.c b/src/compiler/nir/nir_gather_<wbr>info.c<br> index 07c9949..35a1ce4 100644<br> --- a/src/compiler/nir/nir_gather_<wbr>info.c<br> +++ b/src/compiler/nir/nir_gather_<wbr>info.c<br> @@ -53,11 +53,6 @@ set_io_mask(nir_shader *shader, nir_variable *var, int offset, int len)<br> else<br> shader->info->inputs_read |= bitfield;<br> <br> - /* double inputs read is only for vertex inputs */<br> - if (shader->stage == MESA_SHADER_VERTEX &&<br> - glsl_type_is_dual_slot(glsl_w<wbr>ithout_array(var->type)))<br> - shader->info->double_inputs_re<wbr>ad |= bitfield;<br> -<br> if (shader->stage == MESA_SHADER_FRAGMENT) {<br> shader->info->fs.uses_sample_<wbr>qualifier |= var->data.sample;<br> }<br> @@ -83,26 +78,21 @@ static void<br> mark_whole_variable(nir_shade<wbr>r *shader, nir_variable *var)<br> {<br> const struct glsl_type *type = var->type;<br> - bool is_vertex_input = false;<br> <br> if (nir_is_per_vertex_io(var, shader->stage)) {<br> assert(glsl_type_is_array(typ<wbr>e));<br> type = glsl_get_array_element(type);<br> }<br> <br> - if (shader->stage == MESA_SHADER_VERTEX &&<br> - var->data.mode == nir_var_shader_in)<br> - is_vertex_input = true;<br> -<br> const unsigned slots =<br> var->data.compact ? DIV_ROUND_UP(glsl_get_length(t<wbr>ype), 4)<br> - : glsl_count_attribute_slots(typ<wbr>e, is_vertex_input);<br> + : glsl_count_attribute_slots(typ<wbr>e, false);<br> <br> set_io_mask(shader, var, 0, slots);<br> }<br> <br> static unsigned<br> -get_io_offset(nir_deref_var *deref, bool is_vertex_input)<br> +get_io_offset(nir_deref_var *deref)<br> {<br> unsigned offset = 0;<br> <br> @@ -117,7 +107,7 @@ get_io_offset(nir_deref_var *deref, bool is_vertex_input)<br> return -1;<br> }<br> <br> - offset += glsl_count_attribute_slots(tai<wbr>l->type, is_vertex_input) *<br> + offset += glsl_count_attribute_slots(tai<wbr>l->type, false) *<br> deref_array->base_offset;<br> }<br> /* TODO: we can get the offset for structs here see nir_lower_io() */<br> @@ -163,12 +153,7 @@ try_mask_partial_io(nir_shader *shader, nir_deref_var *deref)<br> return false;<br> }<br> <br> - bool is_vertex_input = false;<br> - if (shader->stage == MESA_SHADER_VERTEX &&<br> - var->data.mode == nir_var_shader_in)<br> - is_vertex_input = true;<br> -<br> - unsigned offset = get_io_offset(deref, is_vertex_input);<br> + unsigned offset = get_io_offset(deref);<br> if (offset == -1)<br> return false;<br> <br> @@ -184,8 +169,7 @@ try_mask_partial_io(nir_shader *shader, nir_deref_var *deref)<br> }<br> <br> /* double element width for double types that takes two slots */<br> - if (!is_vertex_input &&<br> - glsl_type_is_dual_slot(glsl_w<wbr>ithout_array(type))) {<br> + if (glsl_type_is_dual_slot(glsl_w<wbr>ithout_array(type))) {<br> elem_width *= 2;<br> }<br> <br> @@ -220,13 +204,27 @@ gather_intrinsic_info(nir_intr<wbr>insic_instr *instr, nir_shader *shader)<br> case nir_intrinsic_interp_var_at_sa<wbr>mple:<br> case nir_intrinsic_interp_var_at_of<wbr>fset:<br> case nir_intrinsic_load_var:<br> - case nir_intrinsic_store_var:<br> - if (instr->variables[0]->var->dat<wbr>a.mode == nir_var_shader_in ||<br> - instr->variables[0]->var->data<wbr>.mode == nir_var_shader_out) {<br> + case nir_intrinsic_store_var: {<br> + nir_variable *var = instr->variables[0]->var;<br> +<br> + if (var->data.mode == nir_var_shader_in ||<br> + var->data.mode == nir_var_shader_out) {<br> if (!try_mask_partial_io(shader, instr->variables[0]))<br> - mark_whole_variable(shader, instr->variables[0]->var);<br> + mark_whole_variable(shader, var);<br> +<br> + /* We need to track which input_reads bits correspond to a<br> + * dvec3/dvec4 input attribute */<br> + if (shader->stage == MESA_SHADER_VERTEX &&<br> + var->data.mode == nir_var_shader_in &&<br> + glsl_type_is_dual_slot(glsl_w<wbr>ithout_array(var->type))) {<br> + for (uint i = 0; i < glsl_count_attribute_slots(var<wbr>->type, false); i++) {<br> + int idx = var->data.location + i;<br> + shader->info->double_inputs_r<wbr>ead |= BITFIELD64_BIT(idx);<br> + }<br> + }<br> }<br> break;<br> + }<br> <br> case nir_intrinsic_load_draw_id:<br> case nir_intrinsic_load_front_face:<br> diff --git a/src/intel/vulkan/genX_pipeli<wbr>ne.c b/src/intel/vulkan/genX_pipeli<wbr>ne.c<br> index 845d020..7b94959 100644<br> --- a/src/intel/vulkan/genX_pipeli<wbr>ne.c<br> +++ b/src/intel/vulkan/genX_pipeli<wbr>ne.c<br> @@ -33,26 +33,33 @@<br> static uint32_t<br> vertex_element_comp_control(e<wbr>num isl_format format, unsigned comp)<br> {<br> - uint8_t bits;<br> switch (comp) {<br> - case 0: bits = isl_format_layouts[format].cha<wbr>nnels.r.bits; break;<br> - case 1: bits = isl_format_layouts[format].cha<wbr>nnels.g.bits; break;<br> - case 2: bits = isl_format_layouts[format].cha<wbr>nnels.b.bits; break;<br> - case 3: bits = isl_format_layouts[format].cha<wbr>nnels.a.bits; break;<br> - default: unreachable("Invalid component");<br> - }<br> -<br> - if (bits) {<br> - return VFCOMP_STORE_SRC;<br> - } else if (comp < 3) {<br> - return VFCOMP_STORE_0;<br> - } else if (isl_format_layouts[format].ch<wbr>annels.r.type == ISL_UINT ||<br> - isl_format_layouts[format].cha<wbr>nnels.r.type == ISL_SINT) {<br> - assert(comp == 3);<br> - return VFCOMP_STORE_1_INT;<br> - } else {<br> - assert(comp == 3);<br> - return VFCOMP_STORE_1_FP;<br> + case 0:<br> + return isl_format_layouts[format].cha<wbr>nnels.r.bits ?<br> + VFCOMP_STORE_SRC : VFCOMP_STORE_0;<br> + case 1:<br> + return isl_format_layouts[format].cha<wbr>nnels.g.bits ?<br> + VFCOMP_STORE_SRC : VFCOMP_STORE_0;<br> + case 2:<br> + return isl_format_layouts[format].cha<wbr>nnels.b.bits ?<br> + VFCOMP_STORE_SRC : ((isl_format_layouts[format].c<wbr>hannels.r.type == ISL_RAW) ?<br> + VFCOMP_NOSTORE : VFCOMP_STORE_0);<br></blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">Given all the line wrapping, I think it would be clearer to just have an if ladder here</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="m_-3362162493489463209m_8367481713989651032m_6457694654742999625quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + case 3:<br> + if (isl_format_layouts[format].ch<wbr>annels.a.bits)<br> + return VFCOMP_STORE_SRC;<br> + else<br></blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">Please use braces when one side of the if/else is multiple lines.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="m_-3362162493489463209m_8367481713989651032m_6457694654742999625quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + switch (isl_format_layouts[format].ch<wbr>annels.r.type) {<br> + case ISL_RAW:<br> + return isl_format_layouts[format].cha<wbr>nnels.b.bits ?<br> + VFCOMP_STORE_0 : VFCOMP_NOSTORE;<br></blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">This seems a bit odd. Mind explaining what's going on here?</div><div dir="auto"><br></div></div></div></blockquote><div><br></div></div><div>Yes. When emitting 64-bit components, we either write 128 or 256 bits, using VFCOMP_STORE_0 to pad output properly. We write 256 bits if we need to emit Blue and/or Alpha components.</div><div><br></div><div>If we only need to write 128bits then we use VFCOMP_NOSTORE for the 3rd and 4th components.</div><div><br></div><div>In above code, we already know we are not emitting Alpha (the first condition in "case 3" is false). If we neither need to emit Blue component, then we only need to write 128 bits, so we return NOSTORE.</div><div><br></div><div>But if Blue is required, then we need to write 256bits, so we return VFCOMP_STORE_0 to pad the output.</div></div></blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">Maybe this would be clearer: leave it structured the way it was with the "bits" temporary and add a new case to the if ladder right after the first one that is</div><div dir="auto"><br></div><div dir="auto">} else if (comp >= 2 && !<span style="font-family:sans-serif">isl_format_layouts[format].c</span><span style="font-family:sans-serif">h<wbr>annels.b.bits &&</span></div><div dir="auto"><span style="font-family:sans-serif"> isl_format_layouts[format].c</span><span style="font-family:sans-serif">ha<wbr>nnels.r.type == ISL_RAW)</span><span style="font-family:sans-serif"> {</span></div><div dir="auto"><span style="font-family:sans-serif"> /* comment about writing 128-bit chunks */</span></div><div dir="auto"><span style="font-family:sans-serif"> return VFCOMP_NOSTORE;</span></div><div dir="auto"><br></div></div></blockquote><div><br></div></div><div dir="auto">Sounds good for me. This also requires to change the next branch to</div><div dir="auto"><br></div><div dir="auto">} else if(comp < 3 || (comp == 3 &&</div><div dir="auto"> <span style="font-family:sans-serif">isl_format_layouts[format].c</span><span style="font-family:sans-serif">ha<wbr>nnels.r.type == ISL_RAW))</span><span style="font-family:sans-serif"> {</span></div><div dir="auto"><span style="font-family:sans-serif"> return VFCOMP_STORE_0;</span></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">Otherwise, it would return the wrong value for 4th component when format is R64G64B64.</div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Why are we defaulting the 4th component to zero for 64-bit inputs? I guess the format wouldn't register as integer or float so neither of the other two makes sense. What does the spec day the 4th component should be defaulted to? I'm a bit concerned we may find ourselves needing to inspect the shader to get this 100% right. :-(</div></div> </blockquote><div><br></div><div>Because in the example above, we need do not have a value for the 4th component, but we wrote 3*64 = 192 bits. And as we need to write 256, we pad the remaining 64 bits with 0, as required by spec.</div><div><br></div><div>From Broadwell spec, command reference structures, page 586:</div><div><br></div><div>"When SourceElementFormat is set to one of the *64*_PASSTHRU formats, 64-bit components are stored</div><div>in the URB without any conversion. In this case, vertex elements must be written as 128 or 256 bits, with</div><div>VFCOMP_STORE_0 being used to pad the output as required. E.g., if R64_PASSTHRU is used to copy a</div><div>64-bit Red component into the URB, Component 1 must be specified as VFCOMP_STORE_0 (with</div><div>Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit vertex element, or Components</div><div>1-3 must be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex element. Likewise, use of</div><div>R64G64B64_PASSTHRU requires Component 3 to be specified as VFCOMP_STORE_0 in order to output a</div><div>256-bit vertex element."</div><div><br></div><div><br></div><div>Note this is not specific for the 4th component. For instance, if the input element is a 64-bit single-component value (R64), we would be writing 128bits: the first component is the entire Red value (VFCOMP_STORE_SRC), which would be</div><div>using the first 64bits; and for the second component we would use 0 (VCOMP_STORE_0), to pad the remaining 64bits. In this case, as we do not have neither 3rd nor 4th components, we would</div><div>use VFCOMP_NOSTORE, so only 128bits are written.</div><div><br></div><div><br></div><div>I also tried to test what happens when using other value rathen than 0, and it doesn't work.</div><div><br></div><div><br></div><div> J.A.</div><div><br></div></body></html>