<div dir="ltr">On 2 December 2013 11:39, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">+ fs_inst &<br> + exec_predicated(backend_reg flag, fs_inst &inst)<br> + {<br> + if (flag.file != BAD_FILE) {<br> + inst.predicate = BRW_PREDICATE_NORMAL;<br> + inst.flag_subreg = flag.fixed_hw_reg.subnr / 2;<br> + }<br></blockquote><div><br></div><div>Under what circumstances would flag.file == BAD_FILE? Should we assert that flag.file != BAD_FILE?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +/**<br> + * Copy one of the halves of a SIMD16 vector to a SIMD8 vector.<br> + */<br> +void<br> +brw_fs_surface_visitor::emit_pack_vector_half(<br> + fs_reg dst, fs_reg src,<br> + unsigned i, unsigned size) const<br></blockquote><div><br></div><div>The meaning of the "i" parameter isn't immediately clear. How about if we rename it to "which_half"?<br></div><div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +/**<br> + * Copy a SIMD8 vector to one of the halves of a SIMD16 vector.<br> + */<br> +void<br> +brw_fs_surface_visitor::emit_unpack_vector_half(<br> + fs_reg dst, fs_reg src,<br> + unsigned i, unsigned size) const<br></blockquote><div><br></div><div>Same comment applies here.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +/**<br> + * Initialize the header present in some surface access messages.<br> + */<br> +void<br> +brw_fs_surface_visitor::emit_surface_header(struct fs_reg dst) const <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +{<br> + assert(dst.file == MRF);<br> + exec_all(exec_half(0, emit(BRW_OPCODE_MOV, dst, 0)));<br> + exec_all(emit(BRW_OPCODE_MOV, brw_uvec_mrf(1, dst.reg, 7),<br> + get_sample_mask(v)));<br> +}<br></blockquote><div><br><div>Since different messages have different header data, it would be helpful for the comment to say which messages this function is expected to be used for, or to include a bspec reference that shows the header format for the messages you care about.<br><br></div><div>Also, why is it necessary to call brw_uvec_mrf()? Isn't dst already an mrf?<br> <br></div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">+backend_reg<br> +brw_fs_surface_visitor::emit_coordinate_address_calculation(<br> + backend_reg image, backend_reg addr, unsigned dims) const<br></blockquote><div><br></div><div>It would be nice to have a comment above this function clarifying that it converts from surface coordinates to a raw memory offset, so therefore it is only needed when accessing the hardware surface in RAW mode.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +{<br> + fs_reg x = retype(offset(addr, 0), BRW_REGISTER_TYPE_UD);<br> + fs_reg y = retype(offset(addr, 1), BRW_REGISTER_TYPE_UD);<br> + fs_reg z = retype(offset(addr, 2), BRW_REGISTER_TYPE_UD);<br> + fs_reg offset_x = offset(image, BRW_IMAGE_PARAM_OFFSET_OFFSET + 0);<br> + fs_reg offset_y = offset(image, BRW_IMAGE_PARAM_OFFSET_OFFSET + 1);<br> + fs_reg stride_x = offset(image, BRW_IMAGE_PARAM_STRIDE_OFFSET + 0);<br> + fs_reg stride_y = offset(image, BRW_IMAGE_PARAM_STRIDE_OFFSET + 1);<br> + fs_reg stride_z = offset(image, BRW_IMAGE_PARAM_STRIDE_OFFSET + 2);<br> + fs_reg stride_w = offset(image, BRW_IMAGE_PARAM_STRIDE_OFFSET + 3);<br></blockquote><div><br></div><div>It would be easier to understand if stride_z and stride_w were named horiz_stride_z and vert_stride_z (or something similar) to reflect the terminology used in the documentation of struct brw_image_param.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + fs_reg tile_x = offset(image, BRW_IMAGE_PARAM_TILING_OFFSET + 0);<br> + fs_reg tile_y = offset(image, BRW_IMAGE_PARAM_TILING_OFFSET + 1);<br> + fs_reg tile_z = offset(image, BRW_IMAGE_PARAM_TILING_OFFSET + 2);<br> + fs_reg swz_x = offset(image, BRW_IMAGE_PARAM_SWIZZLING_OFFSET + 0);<br> + fs_reg swz_y = offset(image, BRW_IMAGE_PARAM_SWIZZLING_OFFSET + 1);<br> + fs_reg high_x = make_grf(BRW_REGISTER_TYPE_UD, 1);<br> + fs_reg high_y = make_grf(BRW_REGISTER_TYPE_UD, 1);<br> + fs_reg high_z = make_grf(BRW_REGISTER_TYPE_UD, 1);<br> + fs_reg dst = make_grf(BRW_REGISTER_TYPE_UD, 1);<br> + fs_reg zero = make_grf(BRW_REGISTER_TYPE_UD, 1)<br> + .apply_stride(0);<br></blockquote><div><br></div><div>It would be helpful to have a comment above this block of code explaining that these values come from the brw_image_param struct, so people know where to look to find more information about what they mean.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +<br> + exec_all(emit(BRW_OPCODE_MOV, zero, 0));<br> +<br> + /* Shift the coordinates by the fixed surface offset. */<br> + emit(BRW_OPCODE_ADD, x, x, offset_x);<br></blockquote><div><br></div><div>Is it safe to write to x, y, and z in this code? What if addr is a variable that gets re-used later in the shader? My preference would be to make temporary copies for x, y, and z, and rely on register coalescing to get rid of the copies when they aren't needed.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + if (dims > 1)<br> + emit(BRW_OPCODE_ADD, y, y, offset_y);<br> +<br> + if (dims > 2) {<br> + /* Decompose z into a major and a minor index. */<br> + emit(BRW_OPCODE_SHR, high_z, z, tile_z);<br> + emit(BRW_OPCODE_BFE, z, tile_z, zero, z);<br> +<br> + /* Calculate the vertical slice offset. */<br> + emit(BRW_OPCODE_MUL, high_z, stride_w, high_z);<br> + emit(BRW_OPCODE_ADD, y, y, high_z);<br> +<br> + /* Calculate the horizontal slice offset. */<br> + emit(BRW_OPCODE_MUL, z, stride_z, z);<br> + emit(BRW_OPCODE_ADD, x, x, z);<br> + }<br> +<br> + if (dims > 1) {<br></blockquote><div><br></div><div>This assumes that we don't tile 1D surfaces. AFAICT, we do. That seems stupid, though.<br><br>I think we need to either:<br><br></div><div>(a) precede this patch with a patch that ensures that all 1D surfaces are untiled, and then add a comment here explaining that we don't need to do tile computations for 1D surfaces, or<br> <br></div><div>(b) when dims == 1, set y=0 and then go ahead with the tiling computation.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + /* Decompose x and y into major and minor indices. */<br> + emit(BRW_OPCODE_SHR, high_x, x, tile_x);<br> + emit(BRW_OPCODE_SHR, high_y, y, tile_y);<br> +<br> + emit(BRW_OPCODE_BFE, x, tile_x, zero, x);<br> + emit(BRW_OPCODE_BFE, y, tile_y, zero, y);<br> +<br> + /* Calculate the pixel index from the start of the tile row.<br> + * Equivalent to:<br> + * dst = (high_x << tile_y << tile_x) + (low_y << tile_x) + low_x<br> + */<br> + emit(BRW_OPCODE_SHL, high_x, high_x, tile_y);<br> + emit(BRW_OPCODE_ADD, dst, high_x, y);<br> + emit(BRW_OPCODE_SHL, dst, dst, tile_x);<br> + emit(BRW_OPCODE_ADD, dst, dst, x);<br> +<br> + /* Multiply by the Bpp value. */<br> + emit(BRW_OPCODE_MUL, dst, dst, stride_x);<br> +<br> + /* Add it to the start offset of the tile row. */<br> + emit(BRW_OPCODE_MUL, high_y, stride_y, high_y);<br> + emit(BRW_OPCODE_SHL, high_y, high_y, tile_y);<br> + emit(BRW_OPCODE_ADD, dst, dst, high_y);<br> +<br> + if (v->brw->has_swizzling) {<br> + fs_reg bit_x = make_grf(BRW_REGISTER_TYPE_UD, 1);<br> + fs_reg bit_y = make_grf(BRW_REGISTER_TYPE_UD, 1);<br> +<br> + /* Take into account the two dynamically specified shifts. */<br> + emit(BRW_OPCODE_SHR, bit_x, dst, swz_x);<br> + emit(BRW_OPCODE_SHR, bit_y, dst, swz_y);<br></blockquote><div><br></div><div>It would be nice to have a comment here explaining that if swz_x or swz_y is 0xff, this will shift dst to the right by 31 bits, which is guaranteed to produce a 0 (since no image occupies more than 2G of memory).<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +<br> + /* XOR bit_x and bit_y with bit 6 of the memory address. */<br> + emit(BRW_OPCODE_XOR, bit_x, bit_x, bit_y);<br> + emit(BRW_OPCODE_AND, bit_x, bit_x, 1 << 6);<br> + emit(BRW_OPCODE_XOR, dst, dst, bit_x);<br> + }<br> +<br> + } else {<br> + /* Multiply by the Bpp value. */<br> + emit(BRW_OPCODE_MUL, dst, x, stride_x);<br> + }<br> +<br> + return dst;<br> +}<br> +<br> +backend_reg<br> +brw_fs_surface_visitor::emit_typed_read(<br> + backend_reg flag, backend_reg surface, backend_reg addr,<br> + unsigned dims, unsigned size) const<br> +{<br> + fs_reg dst = make_grf(BRW_REGISTER_TYPE_UD, size);<br> + const unsigned w = v->dispatch_width / 8;<br> + <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + for (unsigned i = 0; i < w; ++i) {<br></blockquote><div><br><div>It would be helpful to have a short comment just above the for loop explaining that for SIMD16, we're splitting the read into two SIMD8 reads because SIMD16 typed read is not supported. A similar comment applies to the emit_typed_write() and emit_typed_atomic() functions.<br> </div><br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +backend_reg<br> +brw_fs_surface_visitor::emit_pad(<br> + backend_reg flag, backend_reg src, unsigned size) const<br></blockquote><div><br></div><div>This function needs a comment explaining what it does and why it's necessary.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +backend_reg<br> +brw_fs_surface_visitor::emit_pack_generic(<br> + backend_reg src,<br> + unsigned shift_r, unsigned width_r,<br> + unsigned shift_g, unsigned width_g,<br> + unsigned shift_b, unsigned width_b,<br> + unsigned shift_a, unsigned width_a) const<br></blockquote><div><br></div><div>This function could also use a comment explaining what it does. In particular, it should be clear from looking at the comments above this function and emit_pack_homogeneous() how the two functions differ.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +{<br> + const unsigned shift[] = { shift_r, shift_g, shift_b, shift_a };<br> + const unsigned width[] = { width_r, width_g, width_b, width_a };<br> + const unsigned bits = width_r + width_g + width_b + width_a;<br> + fs_reg dst = make_grf(BRW_REGISTER_TYPE_UD, bits / 32);<br> + bool seen[4] = {};<br></blockquote><div><br></div><div>If bits > 128, access to seen[] will overflow. How about if we add:<br><br></div><div>assert(bits <= 128);<br><br></div><div>Also, "dword_written" would be a more descriptive name than "seen", and would be consistent with the terminology you use in comments below. A similar comment applies to emit_pack_homogeneous().<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +<br> + for (unsigned i = 0; i < Elements(width); ++i) {<br> + if (width[i]) {<br> + const unsigned j = shift[i] / 32;<br> + const unsigned k = shift[i] % 32;<br> + const unsigned m = (1ull << width[i]) - 1;<br></blockquote><div><br></div><div>The single-letter variable names make this code really hard to follow. How about "dw_offset", "bit_offset", and "bit_mask"? A similar comment applies to emit_pack_homogeneous().<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + fs_reg tmp = make_grf(BRW_REGISTER_TYPE_UD, 1);<br> +<br> + if (seen[j]) {<br> + /* Insert the source value into the bit field if we have<br> + * already written to this dword.<br> + */<br> + emit(BRW_OPCODE_MOV, tmp, m << k);<br> + emit(BRW_OPCODE_BFI2, offset(dst, j),<br> + tmp, offset(src, i), offset(dst, j));<br> +<br> + } else {<br> + /* Otherwise just mask and copy the value over. */<br> + emit(BRW_OPCODE_AND, offset(dst, j),<br> + offset(src, i), m);<br> +<br> + if (k)<br> + emit(BRW_OPCODE_SHL, offset(dst, j),<br> + offset(dst, j), k);<br> +<br> + seen[j] = true;<br> + }<br> + }<br> + }<br> +<br> + return dst;<br> +}<br> +<br> +backend_reg<br> +brw_fs_surface_visitor::emit_unpack_generic(<br> + backend_reg src,<br> + unsigned shift_r, unsigned width_r,<br> + unsigned shift_g, unsigned width_g,<br> + unsigned shift_b, unsigned width_b,<br> + unsigned shift_a, unsigned width_a) const<br></blockquote><div><br></div><div>This function also needs a comment explaining what it does.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +{<br> + const unsigned shift[] = { shift_r, shift_g, shift_b, shift_a };<br> + const unsigned width[] = { width_r, width_g, width_b, width_a };<br> + const unsigned n = !!width_r + !!width_g + !!width_b + !!width_a;<br></blockquote><div><br></div><div>How about "num_components" instead of "n"?<br></div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + fs_reg dst = make_grf(src.type, n);<br> +<br> + for (unsigned i = 0; i < Elements(width); ++i) {<br> + if (width[i]) {<br></blockquote><div><br><div>I'm also concerned because certain combinations of widths will cause this function to emit code that overflows the output register. For example, if width_r == width_g == width_b == 0 and width_a is nonzero, then n will be 1, but the "a" component will get unpacked to offset(dst, 3). How about if we add the assertion:<br><br></div><div>assert(i < n);<br></div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + /* Discard the most significant bits. */<br> + emit(BRW_OPCODE_SHL, offset(dst, i),<br> + offset(src, shift[i] / 32),<br> + 32 - shift[i] % 32 - width[i]);<br> +<br> + /* Shift it back to the least significant bits using an<br> + * arithmetic shift to get sign extension on signed types.<br> + */<br> + emit(BRW_OPCODE_ASR, offset(dst, i),<br> + offset(dst, i), 32 - width[i]);<br> + }<br> + }<br> +<br> + return dst;<br> +}<br> +<br> +namespace {<br> + unsigned<br> + type_for_width(unsigned width)<br> + {<br> + switch (width) {<br> + case 8:<br> + return BRW_REGISTER_TYPE_UB;<br> + case 16:<br> + return BRW_REGISTER_TYPE_UW;<br> + case 32:<br> + return BRW_REGISTER_TYPE_UD;<br> + default:<br> + unreachable();<br></blockquote><div><br></div><div>Change to an assert.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + }<br> + }<br> +}<br> +<br> +backend_reg<br> +brw_fs_surface_visitor::emit_pack_homogeneous(<br> + backend_reg src,<br> + unsigned shift_r, unsigned width_r,<br> + unsigned shift_g, unsigned width_g,<br> + unsigned shift_b, unsigned width_b,<br> + unsigned shift_a, unsigned width_a) const<br></blockquote><div><br></div><div>This function also needs a comment explaining what it does. In particular, is it equivalent functionality to emit_pack_generic() but optimized based on certain assumptions about the shifts and widths? Or does it intended to do something different? emit_unpack_homogeneous() needs a similar comment.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +{<br> + const unsigned shift[] = { shift_r, shift_g, shift_b, shift_a };<br> + const unsigned width[] = { width_r, width_g, width_b, width_a };<br> + const unsigned type = type_for_width(width[0]);<br> + fs_reg dst = make_grf(BRW_REGISTER_TYPE_UD, type_sz(type));<br> + fs_reg csrc = retype(fs_reg(src), type).apply_stride(4 / type_sz(type));<br> + fs_reg cdst = retype(dst, type).apply_stride(4 / type_sz(type)); <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + bool seen[4] = {};<br> +<br> + for (unsigned i = 0; i < 4; ++i) {<br> + if (width[i]) {<br></blockquote><div><br></div><div>It looks like this code relies on the widths and shifts satisfying a few invariants. Can we add these assertions to clarify that?<br><br>assert(width[i] == width[0]);<br> assert(shift[i] % 8 == 0);<br><br></div><div>Similar assertions should be added to emit_unpack_homogeneous().<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + const unsigned j = shift[i] / 32;<br> + const unsigned k = shift[i] % 32;<br> +<br> + if (seen[j]) {<br> + /* Insert the source value into the bit field if we have<br> + * already written to this dword.<br> + */<br> + emit(BRW_OPCODE_MOV, offset(byte_offset(cdst, k / 8), j),<br> + offset(csrc, i));<br> +<br> + } else {<br> + /* Otherwise overwrite the whole dword to make sure that<br> + * unused fields are initialized to zero.<br> + */<br> + emit(BRW_OPCODE_SHL, offset(dst, j),<br> + offset(csrc, i), k);<br> +<br> + seen[j] = true;<br> + }<br> + }<br> + }<br> +<br> + return dst;<br> +}<br> +<br> +backend_reg<br> +brw_fs_surface_visitor::emit_unpack_homogeneous(<br> + backend_reg src,<br> + unsigned shift_r, unsigned width_r,<br> + unsigned shift_g, unsigned width_g,<br> + unsigned shift_b, unsigned width_b,<br> + unsigned shift_a, unsigned width_a) const<br> +{<br> + const unsigned shift[] = { shift_r, shift_g, shift_b, shift_a };<br> + const unsigned width[] = { width_r, width_g, width_b, width_a };<br> + const unsigned type = type_for_width(width[0]);<br> + fs_reg tmp = retype(fs_reg(src), type).apply_stride(4 / type_sz(type));<br> + fs_reg dst = make_grf(src.type, 4);<br></blockquote><div><br></div><div>Why does this use a size of 4 rather than the number of nonzero widths, as emit_unpack_generic() does?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +<br> + for (unsigned i = 0; i < 4; ++i) {<br> + if (width[i])<br> + emit(BRW_OPCODE_MOV,<br> + offset(dst, i),<br> + offset(byte_offset(tmp, shift[i] % 32 / 8), shift[i] / 32));<br></blockquote><div><br></div><div>Won't this do the wrong thing for signed types? It seems like you would need type_for_width to return a signed type to get the proper sign extension.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + }<br> +<br> + return dst;<br> +}<br> ++backend_reg<br> +brw_fs_surface_visitor::emit_convert_from_scaled(<br> + backend_reg src,<br> + unsigned mask0, float scale0,<br> + unsigned mask1, float scale1) const<br> +{<br> + const unsigned mask[] = { mask0, mask1 };<br> + const float scale[] = { scale0, scale1 };<br> + fs_reg dst = retype(src, BRW_REGISTER_TYPE_F);<br> +<br> + for (unsigned i = 0; i < Elements(mask); ++i) {<br> + for (unsigned j = 0; j < 4; ++j) {<br> + if (mask[i] & (1 << j)) {<br> + /* Convert to float and divide by the normalization<br> + * constant.<br> + */<br> + emit(BRW_OPCODE_MOV, offset(dst, j), offset(src, j));<br> + emit(BRW_OPCODE_MUL, offset(dst, j), offset(dst, j),<br> + fs_reg(1.0f / scale[i]));<br> +<br> + /* Clamp to the minimum value. */<br> + if (type_is_signed(src.type))<br> + emit(BRW_OPCODE_SEL, offset(dst, j),<br> + offset(dst, j), -1.0f)<br> + .conditional_mod = BRW_CONDITIONAL_G;<br></blockquote><div><br></div><div>It would be nice to have a short comment explaining why it's not necessary to clamp to +1.0.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + }<br> + }<br> + }<br> +<br> + return dst;<br> +}<br> +<br> +backend_reg<br> +brw_fs_surface_visitor::emit_convert_to_scaled(<br> + backend_reg src, unsigned type,<br> + unsigned mask0, float scale0,<br> + unsigned mask1, float scale1) const<br> +{<br> + const unsigned mask[] = { mask0, mask1 };<br> + const float scale[] = { scale0, scale1 };<br> + fs_reg dst = retype(src, type);<br> +<br> + for (unsigned i = 0; i < Elements(mask); ++i) {<br> + for (unsigned j = 0; j < 4; ++j) {<br> + if (mask[i] & (1 << j)) {<br> + /* Clamp to the minimum value. */<br> + if (type_is_signed(type))<br> + emit(BRW_OPCODE_SEL, offset(src, j),<br> + offset(src, j), -1.0f)<br> + .conditional_mod = BRW_CONDITIONAL_G;<br></blockquote><div><br></div><div>I'm not comfortable with the fact that a lot of these conversion functions write to their source registers. It's safe now, since you only call the conversion functions from contexts where the source is never going to be used again, but if someone comes back and modifies the code in the future, it's not going to be obvious that they have to preserve that invariant. I'd rather we used temporary registers. I think register coalescing and register allocation are smart enough that there won't be a penalty on the final optimizead code.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +<br> + /* Clamp to the maximum value. */<br> + emit(BRW_OPCODE_SEL, offset(src, j),<br> + offset(src, j), 1.0f)<br> + .conditional_mod = BRW_CONDITIONAL_L; <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +<br> + /* Multiply by the normalization constant and convert to<br> + * integer.<br> + */<br> + emit(BRW_OPCODE_MUL, offset(src, j), offset(src, j),<br> + scale[i]);<br> + emit(BRW_OPCODE_MOV, offset(dst, j), offset(src, j));<br> + }<br> + }<br> + }<br> +<br> + return dst;<br> +}<br> +<br> +backend_reg<br> +brw_fs_surface_visitor::emit_convert_from_float(<br> + backend_reg src,<br> + unsigned mask0, unsigned width0,<br> + unsigned mask1, unsigned width1) const<br> +{<br> + const unsigned mask[] = { mask0, mask1 };<br> + const unsigned width[] = { width0, width1 };<br> + fs_reg dst = retype(src, BRW_REGISTER_TYPE_F);<br> +<br> + for (unsigned i = 0; i < Elements(mask); ++i) {<br> + for (unsigned j = 0; j < 4; ++j) {<br> + if (mask[i] & (1 << j)) {<br> + /* Extend 10-bit and 11-bit floating point numbers to 15<br> + * bits. This works because they have a 5-bit exponent<br> + * just like the 16-bit floating point format, and they<br> + * have no sign bit.<br> + */<br> + if (width[i] < 16)<br> + emit(BRW_OPCODE_SHL, offset(src, j),<br> + offset(src, j), 15 - width[i]);<br> +<br> + /* Convert to a 32-bit float. */<br> + emit(BRW_OPCODE_F16TO32, offset(dst, j), offset(src, j));<br> + }<br> + }<br> + }<br> +<br> + return dst;<br> +}<br> +<br> +backend_reg<br> +brw_fs_surface_visitor::emit_convert_to_float(<br> + backend_reg src,<br> + unsigned mask0, unsigned width0,<br> + unsigned mask1, unsigned width1) const<br> +{<br> + const unsigned mask[] = { mask0, mask1 };<br> + const unsigned width[] = { width0, width1 };<br> + fs_reg dst = retype(src, BRW_REGISTER_TYPE_UD);<br> +<br> + for (unsigned i = 0; i < Elements(mask); ++i) {<br> + for (unsigned j = 0; j < 4; ++j) {<br> + if (mask[i] & (1 << j)) {<br> + /* Clamp to the minimum value. */<br> + if (width[i] < 16)<br> + emit(BRW_OPCODE_SEL, offset(src, j),<br> + offset(src, j), 0.0f)<br> + .conditional_mod = BRW_CONDITIONAL_G;<br> +<br> + /* Convert to a 16-bit float. */<br> + emit(BRW_OPCODE_F32TO16, offset(dst, j), offset(src, j));<br> +<br> + /* Discard the least significant bits to get a floating<br> + * point number of the requested width. This works<br> + * because the 10-bit and 11-bit floating point formats<br> + * have a 5-bit exponent just like the 16-bit format, and<br> + * they have no sign bit.<br> + */<br> + if (width[i] < 16)<br> + v->emit(BRW_OPCODE_SHR, offset(dst, j),<br> + offset(dst, j), 15 - width[i]);<br></blockquote><div><br></div><div>There's a corner case with NaN, and I can't decide whether it's important enough to worry about. Technically, any value with an exponent of 0b11111 and a nonzero mantissa is NaN. If the mantissa is too small, truncating its lower bits will change it to Inf. Obviously, dealing with this corner case would require emitting more instructions, so I don't know if it's worth it.<br> <br></div><div>Also, I wonder if we should try to implement proper round-to-nearest-even behaviour here. The best way I can think of to implement it is 4 instructions:<br><br></div><div>MOV dst.F dst.UD<br></div><div>MUL dst.F dst.F (1.0/(1 << (15 - width[i])))<br> </div><div>RNDE dst.F dst.F<br></div><div>MOV dst.UD dst.F<br><br></div><div>Note, however, that this has a different NaN corner case problem: if the mantissa is too large, the RNDE will overflow.<br></div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + }<br> + }<br> + }<br> +<br> + return dst;<br> +}<br></blockquote><div><br></div><div>I'll send more comments in a separate email.<br></div></div></div></div>