<div dir="ltr">On 2 December 2013 11:39, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">diff --git a/src/mesa/drivers/dri/i965/brw_surface_visitor.cpp b/src/mesa/drivers/dri/i965/brw_surface_visitor.cpp<br> new file mode 100644<br> index 0000000..07511b5<br> --- /dev/null<br> +++ b/src/mesa/drivers/dri/i965/brw_surface_visitor.cpp<br> @@ -0,0 +1,1208 @@<br> +/*<br> + * Copyright 2013 Intel Corporation<br> + *<br> + * Permission is hereby granted, free of charge, to any person obtaining a<br> + * copy of this software and associated documentation files (the "Software"),<br> + * to deal in the Software without restriction, including without limitation<br> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,<br> + * and/or sell copies of the Software, and to permit persons to whom the<br> + * Software is furnished to do so, subject to the following conditions:<br> + *<br> + * The above copyright notice and this permission notice (including the next<br> + * paragraph) shall be included in all copies or substantial portions of the<br> + * Software.<br> + *<br> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR<br> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,<br> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL<br> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER<br> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING<br> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS<br> + * IN THE SOFTWARE.<br> + *<br> + * Authors:<br> + * Francisco Jerez <<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>><br> + */<br> +<br> +#include "brw_surface_visitor.h"<br> +#include "brw_context.h"<br> +<br> +brw_surface_visitor::brw_surface_visitor(backend_visitor *v) :<br> + v(v)<br> +{<br> +}<br> +<br> +void<br> +brw_surface_visitor::visit_atomic_counter_intrinsic(ir_call *ir) const<br> +{<br> + const char *callee = ir->callee->function_name();<br> + ir_dereference *deref = static_cast<ir_dereference *>(<br> + ir->actual_parameters.get_head());<br> + const backend_reg offset = v->visit_result(deref);<br> + const backend_reg surface =<br> + brw_imm_ud(v->stage_prog_data->binding_table.abo_start +<br> + deref->variable_referenced()->atomic.buffer_index);<br> + backend_reg tmp;<br> +<br> + if (!strcmp("__intrinsic_atomic_read", callee)) {<br> + tmp = emit_untyped_read(backend_reg(), surface, offset, 1, 1);<br> +<br> + } else if (!strcmp("__intrinsic_atomic_increment", callee)) {<br> + tmp = emit_untyped_atomic(backend_reg(), surface, offset,<br> + backend_reg(), backend_reg(),<br> + 1, BRW_AOP_INC);<br> +<br> + } else if (!strcmp("__intrinsic_atomic_predecrement", callee)) {<br> + tmp = emit_untyped_atomic(backend_reg(), surface, offset,<br> + backend_reg(), backend_reg(),<br> + 1, BRW_AOP_PREDEC);<br> + }<br></blockquote><div><br></div><div>Are these the only calls to emit_untyped_atomic()? If so, why do the flag, src0, and src1 parameters to this function exist, if we always populate them with backend_reg() (causing the logic pertaining to them to be skipped)?<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +<br> + if (ir->return_deref) {<br> + backend_reg dst = v->visit_result(ir->return_deref);<br> + emit_assign_vector(dst, tmp, 1);<br> + }<br> +}<br> +<br> +namespace {<br> + /**<br> + * Process the parameters passed to an image intrinsic call.<br> + */<br></blockquote><div><br></div><div>This sounds like a description of a function, not a structure. How about just "The parameters passed to an image intrinsic call"?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + struct image_intrinsic_parameters {<br> + image_intrinsic_parameters(backend_visitor *v, ir_call *ir)<br> + {<br> + exec_list_iterator it = ir->actual_parameters.iterator();<br> +<br> + image_var = static_cast<ir_dereference *>(it.get())-><br> + variable_referenced();<br> +<br> + image = visit_next(v, it);<br> + addr = visit_next(v, it);<br> +<br> + if (image_var->type->fields.image.dimension == GLSL_IMAGE_DIM_MS)<br> + sample = visit_next(v, it);<br> +<br> + for (int i = 0; it.has_next(); ++i)<br> + src[i] = visit_next(v, it);<br></blockquote><div><br></div><div>A bug in the front end could cause us to overflow the src array. To catch that, let's add:<br><br></div><div>assert(i < Elements(src));<br> <br></div><div>to the body of the loop.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +<br> + if (ir->return_deref)<br> + dst = v->visit_result(ir->return_deref);<br> + }<br> +<br> + ir_variable *image_var;<br> +<br> + backend_reg image;<br> + backend_reg addr;<br> + backend_reg sample;<br> + backend_reg src[2];<br></blockquote><div><br></div><div>It looks like there are not too many possibilities for what will be contained in the "src" array:<br><br></div><div>- For imageSize and imageLoad, src[0] and src[1] are unused.<br> </div><div>- For imageAtomicCompSwap, src[0] is the "compare" parameter, and src[1] is the "data" parameter.<br></div><div>- For all other functions, src[0] is the "data" parameter, and src[1] is unused.<br> <br></div><div>At the very least we should have a comment above the declaration of "src" with this information.<br></div><div><br></div><div>However, it seems like all the other functions that have to deal with these src values would be easier to read if we replaced the "src" array with a pair of fields called "data" and "compare". Then we could use those names uniformly elsewhere (e.g. in emit_typed_atomic(), emit_image_atomic(), and emit_image_store()).<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + backend_reg dst;<br> +<br> + private:<br> + backend_reg<br> + visit_next(backend_visitor *v, exec_list_iterator &it) const<br> + {<br> + ir_dereference *deref = static_cast<ir_dereference *>(it.get());<br></blockquote><div><br></div><div>To help catch bugs, it would be nice to add:<br><br>assert(it.has_next());<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + it.next();<br> + return v->visit_result(deref);<br> + }<br> + };<br> +<br> + /**<br> + * Get the appropriate atomic op for an image atomic intrinsic.<br> + */<br> + unsigned<br> + get_image_atomic_op(const char *callee, ir_variable *image)<br></blockquote><div><br></div><div>"callee" implies that there's a caller/callee relationship that's important to consider. How about renaming this to something like "func_name"?<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + {<br> + const glsl_base_type base_type = image->type->fields.image.type;<br> +<br> + if (!strcmp("__intrinsic_image_atomic_add", callee))<br> + return BRW_AOP_ADD;<br> +<br> + else if (!strcmp("__intrinsic_image_atomic_min", callee))<br> + return (base_type == GLSL_TYPE_UINT ? BRW_AOP_UMIN : BRW_AOP_IMIN);<br> +<br> + else if (!strcmp("__intrinsic_image_atomic_max", callee))<br> + return (base_type == GLSL_TYPE_UINT ? BRW_AOP_UMAX : BRW_AOP_IMAX);<br> +<br> + else if (!strcmp("__intrinsic_image_atomic_and", callee))<br> + return BRW_AOP_AND;<br> +<br> + else if (!strcmp("__intrinsic_image_atomic_or", callee))<br> + return BRW_AOP_OR;<br> +<br> + else if (!strcmp("__intrinsic_image_atomic_xor", callee))<br> + return BRW_AOP_XOR;<br> +<br> + else if (!strcmp("__intrinsic_image_atomic_exchange", callee))<br> + return BRW_AOP_MOV;<br> +<br> + else if (!strcmp("__intrinsic_image_atomic_comp_swap", callee))<br> + return BRW_AOP_CMPWR;<br> +<br> + else<br> + unreachable();<br></blockquote><div><br></div><div>Change to an assert.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + }<br> +}<br> +<br> +void<br> +brw_surface_visitor::visit_image_intrinsic(ir_call *ir) const<br> +{<br> + image_intrinsic_parameters p(v, ir);<br> + const char *callee = ir->callee->function_name();<br> + const unsigned dims = p.image_var->type->coordinate_components();<br> + const GLenum format = (p.image_var->image.write_only ? GL_NONE :<br> + p.image_var->image.format);<br></blockquote><div><br></div><div>Can you add a comment explaining that GL_NONE is a magic value meaning "no format conversion needs to be applied, since the hadware natively supports all formats for write-only access"? Without that piece of context, it's hard to guess what's going on here.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + backend_reg tmp;<br> +<br> + if (!strcmp("__intrinsic_image_load", callee))<br> + tmp = emit_image_load(p.image, p.addr, format, dims);<br> +<br> + else if (!strcmp("__intrinsic_image_store", callee))<br> + emit_image_store(p.image, p.addr, p.src[0], format, dims);<br> +<br> + else<br> + tmp = emit_image_atomic(p.image, p.addr, p.src[0], p.src[1],<br> + format, get_image_atomic_op(callee, p.image_var),<br> + dims);<br> +<br> + if (ir->return_deref) {<br> + const unsigned size = (ir->return_deref->variable_referenced()-><br> + type->components());<br> + emit_assign_vector(p.dst, tmp, size);<br> + }<br> +}<br> +<br> +void<br> +brw_surface_visitor::visit_barrier_intrinsic(ir_call *ir) const<br> +{<br> + emit_memory_fence();<br> +}<br> +<br> +backend_reg<br> +brw_surface_visitor::emit_image_load(backend_reg image, backend_reg addr,<br> + GLenum format, unsigned dims) const<br></blockquote><div><br></div><div>I have several thoughts about this function.<br></div><div><br>a. How effectively is this function piglit tested? There are a lot of formats, and each one has special case behaviours for out-of-bound accesses, so it seems like it would be good to have a piglit test that verifies each format for both in-bound and out-of-bound accesses.<br> </div><div><br></div><div>b. The bspec page "Graphics BSpec: 3D-Media-GPGPU Engine > Shared Functions > Shared Functions – Data Port [Pre-SKL] > Messages > TypedUntyped Surface ReadWrite and TypedUntyped Atomic Operation" includes this erratum: "The <i>Typed Surface Read</i> message returns 0 in all channels for out-of-bounds accesses." This would seem to imply that the following formats need to do emit_coordinate_check() on hsw: GL_RGBA16UI, GL_RGBA16I, GL_RGBA8UI, and GL_RGBA8I.<br> <br></div><div>c. Why do the formats GL_R16UI and GL_R8UI need to use emit_pack_homogeneous() and emit_pad() for IVB but not for HSW? Since the hardware surface format matches the GLSL format for both IVB and HSW, it seems like in both cases we ought to be able to use the HSW code path. If the difference is due to an IVB-specific erratum, that should be documented in a comment in the code.<br> <br></div>d. I'm troubled by the number of cases in this function and the fact that they need to be kept carefully synchronized with the cases in patch 3's get_image_format() function. In order to verify that everything was correct, I had to cross reference with get_image_format(), think about the properties of each GL type, and then figure out how to do the appropriate conversion. It seems to me that we would be better off having emit_image_load() generate the conversion algorithmically based on the properties of the hardware surface format and the GLSL format. The logic I have in mind would look something like this:<br> <div> <br></div><div><div>1. Use the functions in src/mesa/main/formats.c to interrogate the GLSL format. Figure out how many channels and bits/channel it has, as well as its datatype (signed/unsigned normalized, signed/unsigned int, or float), and whether it has a homogeneous number of bits/channel.<br></div><br>2. Call get_image_format() to find out the hardware surface format, and then figure out how many channels and bits/channel it has. Consider RAW to be 32 bits/channel and N/32 channels, where N is the total number of bits in the GLSL format.<br> </div><br>3. If the GLSL format and the hardware surface format match exactly, this is a special case: all we have to do is call emit_typed_read() and retype the result to the appropriate type. (Note: depending on what you say about item "c" above, we may need to disable this special case on IVB for GL_R16UI and GL_R8UI). Otherwise, go on to steps 4-10.<br> <div><br>4. If necessary, call emit_coordinate_check(). This is necessary if the hardware surface format is RAW, the GLSL format has 4 channels, or the hardware surface format has 4 channels and the GLSL format doesn't. (Note: these criteria may be affected by what you say about item "c" above).<br> </div><div><br>5. If the hardware surface format is RAW, call emit_coordinate_address_calculation() and emit_untyped_read() to read the data. Otherwise, use emit_typed_read().<br></div><div><br>6. If the GLSL format is homogeneous, and it's a different number of bits/channel than the hardware surface format, call emit_unpack_homogeneous() or emit_pack_homogeneous() to convert it.<br> </div><div><br>7. If the GLSL format is homogeneous, and it's the same number of bits/channel as the hardware surface format, sign extend the data if necessary by calling emit_unpack_homogeneous().<br></div><div><br>8. If the GLSL format isn't homogeneous, call emit_unpack_generic() to unpack it. There are only two possibilities: 10/10/10/2 and 11/11/10.<br> </div><div><br>9. If the GLSL format is signed/unsigned normalized or float, convert the data by calling emit_convert_from_float() or emit_convert_from_scaled(). Exception: for 32 bits/channel formats, convert_from_float() isn't needed--just retype to BRW_REGISTER_TYPE_F.<br> </div><div><br>10. If necessary, call emit_pad(). This is necessary if emit_coordinate_check() was used or if the GLSL format has <4 channels.<br><br></div><div>I think this would be a lot less code than what you've written, and it would be a lot easier to have confidence in, since most of the code paths would be exercised fairly frequently, as opposed to having totally independent code paths for every single format. Also, it would have the advantage of having a lot fewer Ivy Bridge vs Haswell differences.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +void<br> +brw_surface_visitor::emit_image_store(backend_reg image, backend_reg addr,<br> + backend_reg src,<br> + GLenum format, unsigned dims) const<br></blockquote><div><br></div><div>Again, several thoughts:<br><br>a. It would be nice to have a comment above this function reiterating the special meaning of passing GL_NONE for the format.<br> <br>b. Why is it necessary to call emit_coordinate_check() in the handlers for GL_RGB10_A2UI, GL_RGBA8UI (IVB), GL_RGBA8I (IVB), GL_RGB10_A2, GL_RGBA8 (IVB), and GL_RGBA8_SNORM (IVB)? It seems to me that for writes, it should only be necessary to call emit_coordinate_check() when the hardware surface format is RAW.<br> <br></div><div>c. It would be nice to have some comments to help people understand when it's necessary to call emit_convert_to_integer() and when it's not. It looks like from your code like the GL spec requires out-of-range values to be clamped, and the hardware does the necessary clamping--is this correct? If so, then it seems like the handler for GL_RGBA16I (HSW) needs to call emit_pack_homogeneous(); otherwise, negative values will get clamped by the hardware to 0xffff. There's a similar situation for GL_RGBA8I (HSW), GL_RG16I (HSW), GL_RG8I (HSW), GL_R16I, GL_R8I.<br> <br></div><div>d. As with the previous function, I think it would be better to generate the code algorithmically rather than case by case. I think the algorithm would look something like this:<br><br></div><div>1. As in emit_image_load().<br> <br></div><div>2. As in emit_image_load().<br><br>3. If the GLSL format and the hardware surface format match exactly, this is a special case: all we have to do is emit_typed_write(). Otherwise, go on to steps 4-9.<br><br> 4. If the GLSL format is signed/unsigned normalized or float, convert the data by calling emit_convert_to_float() or emit_convert_to_scaled(). Exception: for 32 bits/channel formats, emit_convert_to_float() isn't needed--just retype the register.<br> <br>5. If the GLSL format is signed/unsigned integer, clamp the data by calling emit_convert_to_integer(). Exception: not needed if the GLSL format is 32 bits/channel.<br><br>6. If the GLSL format is homogeneous, and it's a different number of bits/channel than the hardware surface format, call emit_unpack_homogeneous() or emit_pack_homogeneous() to convert it.<br> <br>7. If the GLSL format is homogeneous, it's the same number of bits/channel as the hardware surface format, and it's a signed integer format with <32 bits/channel, call emit_pack_homogeneous() to make sure that sign extended bits don't cause the hardware to clamp (see "c" above).<br> <br>8. If the GLSL format isn't homogeneous, call emit_pack_generic() to pack it. There are only two possibilities: 10/10/10/2 and 11/11/10.<br><br>9. If the hardware surface format is RAW, call emit_coordinate_check(), emit_coordinate_address_calculation(), and emit_untyped_write() to write the data. Otherwise, use emit_typed_write().<br> <br></div><div>I'll send more comments in a separate email.<br></div></div></div></div>