On Sep 9, 2016 12:05 PM, "Chad Versace" <<a href="mailto:chadversary@chromium.org">chadversary@chromium.org</a>> wrote: > > On Fri 09 Sep 2016, Jason Ekstrand wrote: > > Normally, using a non-linear tiling format helps improve cache locality by > > ensuring that neighboring pixels are usually close-by in memory. For RGB > > formats, this still sort-of holds, but it can also lead to rather terrible > > memory access patterns where a single RGB pixel value crosses a tile > > boundary and gets split into two pieces in different 4K pages. It also > > makes for some rather awkward calculations because your tile size is no > > longer an even multiple of surface element size. For these reasons, we > > chose to simply never create tiled RGB images in the Vulkan driver. > > > > The GL driver, however, is not so kind so we need to support it somehow. I > > briefly toyed with a couple of different schemes but this is the best one I > > could come up with. The fundamental problem is that a tile no longer > > contains an integer number of surface elements. I briefly considered a > > couple other options but found them wanting: > > > > 1) Using floats for the logical tile size. This leads to potential > > rounding error problems. > > Let's stay far away from floats With floats, there is just too much risk > for weird bugs. > > Floats are for representing ℝ. Our calculations are in ℚ, so let's > restrict ourselves to ints. (Does GMail display the blackboard bold > UTF-8 correctly?) It does! Even on my phone. > > 2) When presented with a RGB format, just make the tile 3-times as wide. > > This isn't so nice because now our tiles are no longer power-of-two > > size. Also, it can force the row_pitch to be larger than needed which, > > while not strictly a problem for ISL, causes incompatibility problems > > with the way the GL driver chooses surface pitches. > > > > The chosen method requires that you pay attention and not just assume that > > your tile_info is in the units you think it is. However, it's nice because > > it provides a nice "these are the units" declaration in isl_tile_info > > itself. Previously, the tile_info wasn't usable as a stand-alone structure > > because you had to also know the format. It also forces figuring out how > > to deal with inconsistencies between tiling and format back to the caller > > which is good because the two different consumers of isl_tile_info really > > want to deal with it differently: Computation of the surface size wants > > the fewest number of horizontal tiles possible while get_intratile_offset > > is far more concerned with things aligning nicely. > > > > Signed-off-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> > > Cc: Chad Versace <<a href="mailto:chad@kiwitree.net">chad@kiwitree.net</a>> > > --- > > src/intel/isl/isl.c | 49 ++++++++++++++++++++++++++++++++++++------------- > > src/intel/isl/isl.h | 10 +++++++++- > > 2 files changed, 45 insertions(+), 14 deletions(-) > > > > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c > > index 34245f4..8c72186 100644 > > --- a/src/intel/isl/isl.c > > +++ b/src/intel/isl/isl.c > > @@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev, > > const uint32_t bs = format_bpb / 8; > > struct isl_extent2d logical_el, phys_B; > > > > - assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb)); > > + if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) { > -----------------------------------------^^^ > 1. Weirdo space after '!' > > > + /* It is possible to have non-power-of-two formats in a tiled buffer. > > + * The easiest way to handle this is to treat the tile as if it is three > > + * times as wide. This way no pixel will ever cross a tile boundary. > > + * This really only works on legacy X and Y tiling formats. > > + */ > > + assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0); > > + assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3)); > > + return isl_tiling_get_info(dev, tiling, format_bpb / 3, tile_info); > > + } > > > > switch (tiling) { > > case ISL_TILING_LINEAR: > > @@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev, > > > > *tile_info = (struct isl_tile_info) { > > .tiling = tiling, > > + .format_bpb = format_bpb, > > .logical_extent_el = logical_el, > > .phys_extent_B = phys_B, > > }; > > @@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev, > > } > > base_alignment = isl_round_up_to_power_of_two(base_alignment); > > } else { > > + assert(fmtl->bpb % tile_info.format_bpb == 0); > > + const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb; > > > + assert(tile_el_scale == 1 || tile_el_scale == 3); > > 2. Without the context of this patch, the assertion is very mysterious. > Future people will be afraid to touch it, as there's no clue to the > magic 3's justification. Please add a comment. Yes, it can be dropped. At one point I was trying harder to be clever but the code below doesn't care how big tile_el_scale is. > > + > > assert(phys_slice0_sa.w % fmtl->bw == 0); > > const uint32_t total_w_el = phys_slice0_sa.width / fmtl->bw; > > const uint32_t total_w_tl = > > - isl_align_div(total_w_el, tile_info.logical_extent_el.width); > > + isl_align_div(total_w_el * tile_el_scale, > > + tile_info.logical_extent_el.width); > > > > row_pitch = total_w_tl * tile_info.phys_extent_B.width; > > if (row_pitch < info->min_pitch) { > > - row_pitch = isl_align(info->min_pitch, tile_info.phys_extent_B.width); > > + row_pitch = isl_align_npot(info->min_pitch, > > + tile_info.phys_extent_B.width); > > } > > > > total_h_el += isl_align_div_npot(pad_bytes, row_pitch); > > @@ -1678,14 +1694,6 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev, > > uint32_t *x_offset_el, > > uint32_t *y_offset_el) > > { > > - /* This function only really works for power-of-two surfaces. In > > - * theory, we could make it work for non-power-of-two surfaces by going > > - * to the left until we find a block that is bs-aligned. The Vulkan > > - * driver doesn't use non-power-of-two tiled surfaces so we'll leave > > - * this unimplemented for now. > > - */ > > - assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(bs)); > > - > > if (tiling == ISL_TILING_LINEAR) { > > *base_address_offset = total_y_offset_el * row_pitch + > > total_x_offset_el * bs; > > @@ -1694,8 +1702,24 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev, > > return; > > } > > > > + const uint32_t bpb = bs * 8; > > + > > struct isl_tile_info tile_info; > > - isl_tiling_get_info(dev, tiling, bs * 8, &tile_info); > > + isl_tiling_get_info(dev, tiling, bpb, &tile_info); > > + > > + assert(row_pitch % tile_info.phys_extent_B.width == 0); > > + > > + /* For non-power-of-two formats, we need the address to be both tile and > > + * element-aligned. The easiest way to achieve this is to work with a tile > > + * that is three times as wide as the regular tile. > > + * > > + * The tile info returned by get_tile_info has a logical size that is an > > + * integer number of tile_info.format_bpb size elements. To scale the > > + * tile, we scale up the physical width and then treat the logical tile > > + * size as if it has bpb size elements. > > + */ > > + const uint32_t tile_el_scale = bpb / tile_info.format_bpb; > > + tile_info.phys_extent_B.width *= tile_el_scale; > > > > /* Compute the offset into the tile */ > > *x_offset_el = total_x_offset_el % tile_info.logical_extent_el.w; > > @@ -1705,7 +1729,6 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev, > > uint32_t x_offset_tl = total_x_offset_el / tile_info.logical_extent_el.w; > > uint32_t y_offset_tl = total_y_offset_el / tile_info.logical_extent_el.h; > > > > - assert(row_pitch % tile_info.phys_extent_B.width == 0); > > *base_address_offset = > > y_offset_tl * tile_info.phys_extent_B.h * row_pitch + > > x_offset_tl * tile_info.phys_extent_B.h * tile_info.phys_extent_B.w; > > diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h > > index 36ef81c..92a694a 100644 > > --- a/src/intel/isl/isl.h > > +++ b/src/intel/isl/isl.h > > @@ -728,7 +728,15 @@ struct isl_format_layout { > > struct isl_tile_info { > > enum isl_tiling tiling; > > > > - /** The logical size of the tile in units of surface elements > > + /* The size (in bits per block) of a single surface element > > + * > > + * For surfaces with power-of-two formats, this is the same as > > + * isl_format_layout::bpb. For non-power-of-two formats it may be smaller. > > + * The logical_extent_el field is in terms of elements of this size. > > 3. It would be very nice to insert an example here for > ISL_FORMAT_R32G32B32_FLOAT, to help the future people that will > inevitably be confused by this surprising resizing. Here's a suggestion: > > For example, consider ISL_FORMAT_R32G32B32_FLOAT, for which > isl_format_layout::bpb is 96 (a non-power-of-two). When calculating > surface layout, isl wants to work only with power-of-two bpb. So isl > splits the pixel into 3 surface elements, one for each channel, and > isl_tile_info::format_bpb is 32. I'm not sure I like exactly what you said there but I will definitely cook up a comment. More comments rarely hurt. > > + */ > > + uint32_t format_bpb; > > + > > + /** The logical size of the tile in units of format_bpb size elements > > * > > * This field determines how a given surface is cut up into tiles. It is > > * used to compute the size of a surface in tiles and can be used to > > Address feedback #1 and #2, and this gets > Acked-by: Chad Versace <<a href="mailto:chadversary@chromium.org">chadversary@chromium.org</a>> > > Just an ack, because I didn't comb through all of isl to verify that > this was safe. I did comb through enough to know that the two places I modified ate the only two places that we actually use the tie width. They're is one other place that uses the tile size only uses the height. > Address feedback #3 and you get bonus points.