On 17 December 2011 11:50, Eric Anholt <span dir="ltr"><<a href="mailto:eric@anholt.net">eric@anholt.net</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Tue, 13 Dec 2011 15:35:18 -0800, Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> wrote:<br>
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c b/src/mesa/drivers/dri/i965/brw_gs.c<br>
> index f5d5898..c323a73 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_gs.c<br>
> +++ b/src/mesa/drivers/dri/i965/brw_gs.c<br>
> @@ -183,7 +183,31 @@ static void populate_key( struct brw_context *brw,<br>
> } else if (intel->gen == 6) {<br>
> /* On Gen6, GS is used for transform feedback. */<br>
> /* _NEW_TRANSFORM_FEEDBACK */<br>
> - key->need_gs_prog = ctx->TransformFeedback.CurrentObject->Active;<br>
> + if (ctx->TransformFeedback.CurrentObject->Active) {<br>
> + const struct gl_shader_program *shaderprog =<br>
> + ctx->Shader.CurrentVertexProgram;<br>
> + const struct gl_transform_feedback_info *linked_xfb_info =<br>
> + &shaderprog->LinkedTransformFeedback;<br>
> + int i;<br>
> +<br>
> + /* Make sure that the VUE slots won't overflow the unsigned chars in<br>
> + * key->transform_feedback_bindings[].<br>
> + */<br>
> + assert (BRW_VERT_RESULT_MAX <= 256);<br>
<br>
</div>We generally don't put a space between assert and the opening paren.<br></blockquote><div><br>I prefer to put a space before the opening paren to highlight whenever the thing I am doing is not a function call, as a reminder that flow control or argument evaluation may not be as expected--we follow this convention fairly consistently in i965 for if, for, switch, and while. I think we should do it for assert as well, since assert has both flow control implications (it may abort) and argument evaluation implications (it won't evaluate its argument under non-debug builds).<br>
</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Also, this assert is really weird to me. Neither of the two fields in<br>
the key are related to 256 -- there's a 7-bit field, and a<br>
BRW_MAX_SOL_BINDINGS field. (Also, for compile-time checks like this,<br>
we now have STATIC_ASSERT. Hooray!)<br></blockquote><div><br>The "BRW_MAX_SOL_BINDINGS field" (key->transform_feedback_bindings[]) is related to 256 because it's an array of unsigned chars. Each entry in the array of unsigned chars stores a gl_vert_result, so the assertion is to make sure that all possible gl_vert_result values can fit in an unsigned char without overflow. In retrospect, the assertion should read (VERT_RESULT_MAX <= 256), since the brw-specific extensions to gl_vert_result will never appear in key->transform_feedback_bindings[]. I'll fix that and change the assertion to a STATIC_ASSERT.<br>
</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><div class="h5"><br>
> @@ -107,12 +119,27 @@ static void brw_gs_overwrite_header_dw2(struct brw_gs_compile *c,<br>
> * of DWORD 2. URB_WRITE messages need the primitive type in bits 6:2 of<br>
> * DWORD 2. So this function extracts the primitive type field, bitshifts it<br>
> * appropriately, and stores it in c->reg.header.<br>
> + *<br>
> + * Also, if num_verts is 3, it converts primitive type<br>
> + * _3DPRIM_TRISTRIP_REVERSE to _3DPRIM_TRISTRIP, so that odd numbered<br>
> + * triangles in a triangle strip will render correctly.<br>
> */<br>
> -static void brw_gs_overwrite_header_dw2_from_r0(struct brw_gs_compile *c)<br>
> +static void brw_gs_overwrite_header_dw2_from_r0(struct brw_gs_compile *c,<br>
> + unsigned num_verts)<br>
> {<br>
> struct brw_compile *p = &c->func;<br>
> brw_AND(p, get_element_ud(c->reg.header, 2), get_element_ud(c->reg.R0, 2),<br>
> brw_imm_ud(0x1f));<br>
> + if (num_verts == 3) {<br>
> + brw_CMP(p, vec1(brw_null_reg()), BRW_CONDITIONAL_EQ,<br>
> + get_element_ud(c->reg.header, 2),<br>
> + brw_imm_ud(_3DPRIM_TRISTRIP_REVERSE));<br>
> + {<br>
> + brw_MOV(p, get_element_ud(c->reg.header, 2),<br>
> + brw_imm_ud(_3DPRIM_TRISTRIP));<br>
> + brw_set_predicate_control(p, BRW_PREDICATE_NONE);<br>
> + }<br>
> + }<br>
<br>
</div></div>This doesn't mess up normal rendering of tristrips in any way? It looks<br>
like it could be a separate commit to me.<br></blockquote><div><br>It doesn't mess up normal rendering of tristrips because we also set the GEN6_GS_REORDER bit of gen6_gs_state.c--this causes alternate triangles in a tristrip to be reordered so that they all have the same orientation, so in the geometry shader we need to map _3DPRIM_TRISTRIP_REVERSE to _3DPRIM_TRISTRIP to prevent future pipeline stages from trying to undo the reordering.<br>
<br>Having said that, though, I'm going to revert this change, because the reordering performed by GEN6_GS_REORDER is not the reordering we need for transform feedback. For a tristrip with vertices A, B, C, D, and E (in that order), the hardware normally generates triangles in the order ABC, BCD, CDE (which is wrong because BCD is improperly oriented). With reordering it generates ABC, BDC, CDE. However, to ensure that the transform feedback output will render properly when flatshading, we need the order to be ABC, CBD, CDE.<br>
<br>So consider this part of the patch NAK'd. I want to wait until we are implementing transform feedback on Gen7 to fix ordering issues anyhow (since on Gen7 we have less control over the ordering, and there's not much point in fixing obscure Gen6 bugs that will be impossible to fix on Gen7). Once we know what Gen7 does, I'll patch Gen6 as necessary to make sure it performs at least as well.<br>
</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c<br>
> index f9b0b71..4ec9ef5 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c<br>
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c<br>
<br>
</div><div class="im">> +/**<br>
> + * Set up a binding table entry for use by stream output logic (transform<br>
> + * feedback).<br>
> + *<br>
> + * buffer_size_minus_1 must me less than BRW_MAX_NUM_BUFFER_ENTRIES.<br>
> + */<br>
> +static void<br>
> +brw_update_sol_surface(struct brw_context *brw, drm_intel_bo *bo,<br>
> + uint32_t *out_offset, unsigned num_vector_components,<br>
> + unsigned stride_dwords, unsigned offset_dwords,<br>
> + uint32_t buffer_size_minus_1)<br>
> +{<br>
> + uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, 6 * 4, 32,<br>
> + out_offset);<br>
> + uint32_t width = buffer_size_minus_1 & 0x7f;<br>
> + uint32_t height = (buffer_size_minus_1 & 0xfff80) >> 7;<br>
> + uint32_t depth = (buffer_size_minus_1 & 0x7f00000) >> 20;<br>
> + uint32_t pitch_minus_1 = 4*stride_dwords - 1;<br>
> + uint32_t surface_format;<br>
> + uint32_t offset_bytes = 4 * offset_dwords;<br>
<br>
</div>Could I get just a little more whitespace in your code? Between the<br>
declaration and the switch, after the switch, etc.?<br></blockquote><div><br>Ok, done.<br> <br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> + surf[1] = bo->offset + offset_bytes; /* reloc */<br>
> + surf[2] = (width << BRW_SURFACE_WIDTH_SHIFT |<br>
> + height << BRW_SURFACE_HEIGHT_SHIFT);<br>
> + surf[3] = (depth << BRW_SURFACE_DEPTH_SHIFT |<br>
> + pitch_minus_1 << BRW_SURFACE_PITCH_SHIFT);<br>
<br>
</div>This looks to me like the hardware width/heigth/depth fields are ending<br>
up programmed as 1 larger than you intend. Compare to<br>
brw_create_constant_surface().<br></blockquote><div><br>Actually brw_create_constant_surface is in error. According to the Sandy Bridge PRM volume 4 part 1, pp. 77-81 ("SURFACE_STATE for most messages"), when the surface type is SURFTYPE_BUFFER:<br>
- height contains bits [19:7] of the number of entries in the buffer - 1<br>- width contains bits [6:0] of the number of entries in the buffer - 1<br>- depth contains bits [26:20] of the number of entries in the buffer - 1<br>
<br>Unfortunately, this has an ambiguous interpretation: it cound mean either "... contains bits [...] of (the number of entries in the buffer - 1)" or "... contains (bits [...] of the number of entries in the buffer) - 1". Based on reading simulator code, and experimenting with the hardware, Ken and I were able to determine that the former interpretation is correct.<br>
<br>The code in brw_create_constant_surface sets:<br>- height to (bits [19:7] of (the number of entries in the buffer - 1)) - 1<br>- width to (bits [6:0] of (the number of entries in the buffer - 1)) - 1<br>- depth to (bits [26:20] of (the number of entries in the buffer - 1)) - 1<br>
<br>Which doesn't match either interpretation.<br><br>The reason we never noticed this bug before is that brw_create_constant_surface is only used for uniforms, and there are always a sufficiently small number of uniforms that depth gets set to all 1's. As a result, the surface is always created with the ability to address at least 0x7f00001 buffer entries.<br>
<br>There are also some numeric overflow problems in brw_create_constant_surface which cause it to set some bits which are supposed to be zero, but fortunately those bits don't seem to have any effect.<br><br>I plan to submit a patch fixing brw_create_constant_surface once transform feedback is finished.<br>
</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> + /* Emit relocation to surface contents. Section 5.1.1 of the gen4<br>
> + * bspec ("Data Cache") says that the data cache does not exist as<br>
> + * a separate cache and is just the sampler cache.<br>
> + */<br>
> + drm_intel_bo_emit_reloc(brw-><a href="http://intel.batch.bo" target="_blank">intel.batch.bo</a>,<br>
> + *out_offset + 4,<br>
> + bo, offset_bytes,<br>
> + I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER);<br>
> +}<br>
<br>
</div>If it's the sampler cache, that would be (I915_GEM_DOMAIN_SAMPLER, 0).<br>
But it obviously isn't, since we're reading and writing.<br></blockquote><div><br>Whoops, that comment is a copy-and-paste bug. I'll remove it.<br> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><div class="h5"><br>
> diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c<br>
> new file mode 100644<br>
> index 0000000..af372c1<br>
> --- /dev/null<br>
> +++ b/src/mesa/drivers/dri/i965/gen6_sol.c<br>
> @@ -0,0 +1,102 @@<br>
> +/*<br>
> + * Copyright © 2011 Intel Corporation<br>
> + *<br>
> + * Permission is hereby granted, free of charge, to any person obtaining a<br>
> + * copy of this software and associated documentation files (the "Software"),<br>
> + * to deal in the Software without restriction, including without limitation<br>
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,<br>
> + * and/or sell copies of the Software, and to permit persons to whom the<br>
> + * Software is furnished to do so, subject to the following conditions:<br>
> + *<br>
> + * The above copyright notice and this permission notice (including the next<br>
> + * paragraph) shall be included in all copies or substantial portions of the<br>
> + * Software.<br>
> + *<br>
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR<br>
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,<br>
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL<br>
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER<br>
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING<br>
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS<br>
> + * IN THE SOFTWARE.<br>
> + */<br>
> +<br>
> +/** \file gen6_sol.c<br>
> + *<br>
> + * Code to initialize the binding table entries used by transform feedback.<br>
> + */<br>
> +<br>
> +#include "brw_context.h"<br>
> +#include "intel_buffer_objects.h"<br>
> +#include "brw_defines.h"<br>
> +<br>
> +static void<br>
> +brw_update_sol_surfaces(struct brw_context *brw)<br>
<br>
</div></div>If it's gen6-only, let's call it gen6_update_sol_surfaces.<br></blockquote><div><br>Ok.<br> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> +{<br>
> + unsigned buffer_offsets[BRW_MAX_SOL_BUFFERS];<br>
> + struct gl_context *ctx = &brw->intel.ctx;<br>
> + struct intel_context *intel = &brw->intel;<br>
> + /* _NEW_TRANSFORM_FEEDBACK */<br>
> + struct gl_transform_feedback_object *xfb_obj =<br>
> + ctx->TransformFeedback.CurrentObject;<br>
> + /* BRW_NEW_VERTEX_PROGRAM */<br>
> + const struct gl_shader_program *shaderprog =<br>
> + ctx->Shader.CurrentVertexProgram;<br>
> + const struct gl_transform_feedback_info *linked_xfb_info =<br>
> + &shaderprog->LinkedTransformFeedback;<br>
> + int i;<br>
> + for (i = 0; i < BRW_MAX_SOL_BUFFERS; ++i)<br>
> + buffer_offsets[i] = xfb_obj->Offset[i] / 4;<br>
> + for (i = 0; i < BRW_MAX_SOL_BINDINGS; ++i) {<br>
> + const int surf_index = SURF_INDEX_SOL_BINDING(i);<br>
<br>
</div>If you're going to mess with indentation to get your indented code to<br>
fit, that's often a good indication that a separate function is in<br>
order.<br>
<br>
The shadowed loop index made me twitch, but it's actually valid.<br></blockquote><div><br>Perhaps you misread this as a nested loop? It's two separate loops, so there's no messing with indentation or loop index shadowing going on.<br>
<br>It's a moot point anyhow, since the first loop (to initialize buffer_offsets) went away with Marek's suggestion to compute buffer offsets in the linker and store them in gl_transform_feedback_info. I'll post a revised patch so you can see what the code looks like now.<br>
</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><div class="h5"><br>
> + if (xfb_obj->Active && i < linked_xfb_info->NumOutputs) {<br>
> + unsigned buffer = linked_xfb_info->Outputs[i].OutputBuffer;<br>
> + struct gl_buffer_object *buffer_obj = xfb_obj->Buffers[buffer];<br>
> + drm_intel_bo *bo = intel_buffer_object(buffer_obj)->buffer;<br>
> + size_t size_dwords = buffer_obj->Size / 4;<br>
> + unsigned num_components = linked_xfb_info->Outputs[i].NumComponents;<br>
> + unsigned stride = linked_xfb_info->BufferStride[buffer];<br>
> + uint32_t buffer_size_minus_1;<br>
> + /* FIXME: can we rely on core Mesa to ensure that the buffer isn't<br>
> + * too big to map using a single binding table entry?<br>
> + */<br>
> + assert ((size_dwords - buffer_offsets[buffer]) / stride<br>
> + <= BRW_MAX_NUM_BUFFER_ENTRIES);<br>
> + if (size_dwords > buffer_offsets[buffer] + num_components) {<br>
> + /* There is room for at least 1 transform feedback output in the<br>
> + * buffer. Compute the number of additional transform feedback<br>
> + * outputs the buffer has room for.<br>
> + */<br>
> + buffer_size_minus_1 =<br>
> + (size_dwords - buffer_offsets[buffer] - num_components)<br>
> + / stride;<br>
> + } else {<br>
> + /* There isn't even room for a single transform feedback output in<br>
> + * the buffer. We can't configure the binding table entry to<br>
> + * prevent output entirely; we'll have to rely on the geometry<br>
> + * shader to detect overflow. But to minimize the damage in case<br>
> + * of a bug, set up the binding table entry to just allow a single<br>
> + * output.<br>
> + */<br>
> + buffer_size_minus_1 = 0;<br>
> + }<br>
> + intel->vtbl.update_sol_surface(<br>
> + brw, bo, &brw->bind.surf_offset[surf_index],<br>
> + num_components, stride, buffer_offsets[buffer],<br>
> + buffer_size_minus_1);<br>
> + buffer_offsets[buffer] += num_components;<br>
> + } else {<br>
> + brw->bind.surf_offset[surf_index] = 0;<br>
> + }<br>
> + }<br>
> +}<br>
<br>
</div></div>brw_update_sol_surface() looks like it would rather live here than in<br>
brw_wm_surface_state.c, and it shouldn't be in a vtable.<br></blockquote><div><br>I am ok with removing brw_update_sol_surface from the vtable.<br><br>As for moving it to gen6_sol.c, I'm reluctant to do that because it is very similar to the other functions that output SURFACE_STATE structures (brw_update_texture_surface, brw_create_constant_surface, brw_update_null_renderbuffer_surface, and brw_update_renderbuffer_surface), and those functions all live in brw_wm_surface_state.c. Granted, it's kind of weird to have non-WM code in a file with "wm" in the name, but that was the case before this patch (brw_create_constant_surface and brw_update_texture_surface are used for both VS and WM). Would you object if I moved all five of these functions into their own new file (say, brw_surface_state.c)?<br>
</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I think moving most of the logic in this loop into the<br>
brw_update_sol_surface() function would make me happier.<br>
</blockquote></div><br>Ok.<br><br>I'll send out a revised patch.<br>