<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Aug 29, 2018 at 12:49 AM Kenneth Graunke <<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Friday, August 17, 2018 1:06:20 PM PDT Jason Ekstrand wrote:<br>
[snip]<br>
> +# Intel-specific query for loading from the brw_image_param struct passed<br>
> +# into the shader as a uniform. The variable is a deref to the image<br>
> +# variable. The const index specifies which of the six parameters to load.<br>
<br>
This might be our first driver-specific intrinsics. Some people make<br>
big extensibility systems for that, where drivers can extend with their<br>
own concepts. But given that we all live in the same project, I think<br>
this makes a lot of sense - just have everybody add their own here,<br>
suffixed with a name they own (i.e. intel, amd, radv, ir3, whatever).<br></blockquote><div><br></div><div>We almost added an ir3 intrinsic some time ago. I don't remember what it was or why it didn't happen.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
It's certainly nice and simple.<br>
<br>
> +intrinsic("image_deref_load_param_intel", src_comp=[1], dest_comp=0,<br>
> + indices=[BASE], flags=[CAN_ELIMINATE, CAN_REORDER])<br>
> +intrinsic("image_deref_load_raw_intel", src_comp=[1, 1], dest_comp=0,<br>
> + flags=[CAN_ELIMINATE, CAN_REORDER])<br>
> +intrinsic("image_deref_store_raw_intel", src_comp=[1, 1, 0])<br>
<br>
I don't think you want CAN_REORDER for the new load intrinsic...at<br>
least, image_deref_load only has CAN_ELIMINATE. It probably makes<br>
sense for the two to match...<br></blockquote><div><br></div><div>Right... Thanks!<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
[snip]<br>
> +static nir_ssa_def *<br>
> +image_address(nir_builder *b, const struct gen_device_info *devinfo,<br>
> + nir_deref_instr *deref, nir_ssa_def *coord)<br>
> +{<br>
> + coord = sanitize_image_coord(b, deref, coord);<br>
> +<br>
> + nir_ssa_def *offset = load_image_param(b, deref, OFFSET);<br>
> + nir_ssa_def *tiling = load_image_param(b, deref, TILING);<br>
> + nir_ssa_def *stride = load_image_param(b, deref, STRIDE);<br>
> +<br>
> + /* Shift the coordinates by the fixed surface offset. It may be non-zero<br>
> + * if the image is a single slice of a higher-dimensional surface, or if a<br>
> + * non-zero mipmap level of the surface is bound to the pipeline. The<br>
> + * offset needs to be applied here rather than at surface state set-up time<br>
> + * because the desired slice-level may start mid-tile, so simply shifting<br>
> + * the surface base address wouldn't give a well-formed tiled surface in<br>
> + * the general case.<br>
> + */<br>
> + nir_ssa_def *xypos = (coord->num_components == 1) ?<br>
> + nir_vec2(b, coord, nir_imm_int(b, 0)) :<br>
> + nir_channels(b, coord, 0x3);<br>
> + xypos = nir_iadd(b, xypos, offset);<br>
> +<br>
> + /* The layout of 3-D textures in memory is sort-of like a tiling<br>
> + * format. At each miplevel, the slices are arranged in rows of<br>
> + * 2^level slices per row. The slice row is stored in tmp.y and<br>
> + * the slice within the row is stored in tmp.x.<br>
> + *<br>
> + * The layout of 2-D array textures and cubemaps is much simpler:<br>
> + * Depending on whether the ARYSPC_LOD0 layout is in use it will be<br>
> + * stored in memory as an array of slices, each one being a 2-D<br>
> + * arrangement of miplevels, or as a 2D arrangement of miplevels,<br>
> + * each one being an array of slices. In either case the separation<br>
> + * between slices of the same LOD is equal to the qpitch value<br>
> + * provided as stride.w.<br>
> + *<br>
> + * This code can be made to handle either 2D arrays and 3D textures<br>
> + * by passing in the miplevel as tile.z for 3-D textures and 0 in<br>
> + * tile.z for 2-D array textures.<br>
> + *<br>
> + * See Volume 1 Part 1 of the Gen7 PRM, sections 6.18.4.7 "Surface<br>
> + * Arrays" and 6.18.6 "3D Surfaces" for a more extensive discussion<br>
> + * of the hardware 3D texture and 2D array layouts.<br>
> + */<br>
> + if (coord->num_components > 2) {<br>
> + /* Decompose z into a major (tmp.y) and a minor (tmp.x)<br>
> + * index.<br>
> + */<br>
> + nir_ssa_def *z = nir_channel(b, coord, 2);<br>
> + nir_ssa_def *z_x = nir_ubfe(b, z, nir_imm_int(b, 0),<br>
> + nir_channel(b, tiling, 2));<br>
> + nir_ssa_def *z_y = nir_ushr(b, z, nir_channel(b, tiling, 2));<br>
> +<br>
> + /* Take into account the horizontal (tmp.x) and vertical (tmp.y)<br>
> + * slice offset.<br>
> + */<br>
> + xypos = nir_iadd(b, xypos, nir_imul(b, nir_vec2(b, z_x, z_y),<br>
> + nir_channels(b, stride, 0xc)));<br>
> + }<br>
> +<br>
> + nir_ssa_def *addr;<br>
> + if (coord->num_components > 1) {<br>
> + /* Calculate the major/minor x and y indices. In order to<br>
> + * accommodate both X and Y tiling, the Y-major tiling format is<br>
> + * treated as being a bunch of narrow X-tiles placed next to each<br>
> + * other. This means that the tile width for Y-tiling is actually<br>
> + * the width of one sub-column of the Y-major tile where each 4K<br>
> + * tile has 8 512B sub-columns.<br>
> + *<br>
> + * The major Y value is the row of tiles in which the pixel lives.<br>
> + * The major X value is the tile sub-column in which the pixel<br>
> + * lives; for X tiling, this is the same as the tile column, for Y<br>
> + * tiling, each tile has 8 sub-columns. The minor X and Y indices<br>
> + * are the position within the sub-column.<br>
> + */<br>
> +<br>
> + /* Calculate the minor x and y indices. */<br>
> + nir_ssa_def *minor = nir_ubfe(b, xypos, nir_imm_int(b, 0),<br>
> + nir_channels(b, tiling, 0x3));<br>
> + nir_ssa_def *major = nir_ushr(b, xypos, nir_channels(b, tiling, 0x3));<br>
> +<br>
> + /* Calculate the texel index from the start of the tile row and the<br>
> + * vertical coordinate of the row.<br>
> + * Equivalent to:<br>
> + * tmp.x = (major.x << tile.y << tile.x) +<br>
> + * (minor.y << tile.x) + minor.x<br>
> + * tmp.y = major.y << tile.y<br>
> + */<br>
> + nir_ssa_def *idx_x, *idx_y;<br>
> + idx_x = nir_ishl(b, nir_channel(b, major, 0), nir_channel(b, tiling, 1));<br>
> + idx_x = nir_iadd(b, idx_x, nir_channel(b, minor, 1));<br>
> + idx_x = nir_ishl(b, idx_x, nir_channel(b, tiling, 0));<br>
> + idx_x = nir_iadd(b, idx_x, nir_channel(b, minor, 0));<br>
> + idx_y = nir_ishl(b, nir_channel(b, major, 1), nir_channel(b, tiling, 1));<br>
> +<br>
> + /* Add it to the start of the tile row. */<br>
> + nir_ssa_def *idx;<br>
> + idx = nir_imul(b, idx_y, nir_channel(b, stride, 1));<br>
> + idx = nir_iadd(b, idx, idx_x);<br>
> +<br>
<br>
Maybe preserve the /* Multiply by Bpp value */ comment here?<br></blockquote><div><br></div><div>Done<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> + addr = nir_imul(b, idx, nir_channel(b, stride, 0));<br>
> +<br>
> + if (devinfo->gen < 8 && !devinfo->is_baytrail) {<br>
> + /* Take into account the two dynamically specified shifts. Both need<br>
> + * are used to implement swizzling of X-tiled surfaces. For Y-tiled<br>
<br>
May as well fix the "Both need are used" wording here -> "Both are used"<br></blockquote><div><br></div><div>Done<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
[snip]<br>
> +static bool<br>
> +lower_image_load_instr(nir_builder *b,<br>
> + const struct gen_device_info *devinfo,<br>
> + nir_intrinsic_instr *intrin)<br>
> +{<br>
> + nir_deref_instr *deref = nir_src_as_deref(intrin->src[0]);<br>
> + nir_variable *var = nir_deref_instr_get_variable(deref);<br>
> + const enum isl_format image_fmt =<br>
> + isl_format_for_gl_format(var->data.image.format);<br>
> +<br>
> + if (isl_has_matching_typed_storage_image_format(devinfo, image_fmt)) {<br>
> + const enum isl_format lower_fmt =<br>
> + isl_lower_storage_image_format(devinfo, image_fmt);<br>
> + const unsigned dest_components = intrin->num_components;<br>
> +<br>
> + /* Use an undef to hold the uses of the load while we do the color<br>
> + * conversion.<br>
> + */<br>
> + nir_ssa_def *placeholder = nir_ssa_undef(b, 4, 32);<br>
> + nir_ssa_def_rewrite_uses(&intrin->dest.ssa, nir_src_for_ssa(placeholder));<br>
> +<br>
> + intrin->num_components = isl_format_get_num_channels(lower_fmt);<br>
> + intrin->dest.ssa.num_components = intrin->num_components;<br>
> +<br>
> + nir_ssa_def *value;<br>
> + if (devinfo->gen == 7 && !devinfo->is_haswell) {<br>
> + /* Check the first component of the size field to find out if the<br>
> + * image is bound. Necessary on IVB because it don't seem to respect<br>
> + * null surfaces and will hang when no image is bound.<br>
> + */<br>
> + b->cursor = nir_instr_remove(&intrin->instr);<br>
> + nir_ssa_def *size = load_image_param(b, deref, SIZE);<br>
> + nir_push_if(b, nir_ine(b, nir_channel(b, size, 0), nir_imm_int(b, 0)));<br>
> + nir_builder_instr_insert(b, &intrin->instr);<br>
> + nir_push_else(b, NULL);<br>
> + nir_ssa_def *zero = nir_zero_vec(b, intrin->num_components);<br>
> + nir_pop_if(b, NULL);<br>
> + value = nir_if_phi(b, &intrin->dest.ssa, zero);<br>
> + } else {<br>
> + b->cursor = nir_after_instr(&intrin->instr);<br>
> + value = &intrin->dest.ssa;<br>
> + }<br>
<br>
I don't think this Gen7 code is necessary...at least, I don't see the<br>
old code checking this for typed loads. (Atomics, yes...loads, no...)<br></blockquote><div><br></div><div>You're right. It was an extra check on untyped loads and stores which checked that bpp > 4. Typed was left alone. I've reworked things and have kicked it off to Jenkins. I'll send a v2 when it comes back.<br></div><div><br></div><div>--Jason<br></div></div></div>