<p dir="ltr"></p>
<p dir="ltr">On Sep 9, 2016 12:05 PM, "Chad Versace" <<a href="mailto:chadversary@chromium.org">chadversary@chromium.org</a>> wrote:<br>
><br>
> On Fri 09 Sep 2016, Jason Ekstrand wrote:<br>
> > Normally, using a non-linear tiling format helps improve cache locality by<br>
> > ensuring that neighboring pixels are usually close-by in memory. For RGB<br>
> > formats, this still sort-of holds, but it can also lead to rather terrible<br>
> > memory access patterns where a single RGB pixel value crosses a tile<br>
> > boundary and gets split into two pieces in different 4K pages. It also<br>
> > makes for some rather awkward calculations because your tile size is no<br>
> > longer an even multiple of surface element size. For these reasons, we<br>
> > chose to simply never create tiled RGB images in the Vulkan driver.<br>
> ><br>
> > The GL driver, however, is not so kind so we need to support it somehow. I<br>
> > briefly toyed with a couple of different schemes but this is the best one I<br>
> > could come up with. The fundamental problem is that a tile no longer<br>
> > contains an integer number of surface elements. I briefly considered a<br>
> > couple other options but found them wanting:<br>
> ><br>
> > 1) Using floats for the logical tile size. This leads to potential<br>
> > rounding error problems.<br>
><br>
> Let's stay far away from floats With floats, there is just too much risk<br>
> for weird bugs.<br>
><br>
> Floats are for representing ℝ. Our calculations are in ℚ, so let's<br>
> restrict ourselves to ints. (Does GMail display the blackboard bold<br>
> UTF-8 correctly?)</p>
<p dir="ltr">It does! Even on my phone.</p>
<p dir="ltr">> > 2) When presented with a RGB format, just make the tile 3-times as wide.<br>
> > This isn't so nice because now our tiles are no longer power-of-two<br>
> > size. Also, it can force the row_pitch to be larger than needed which,<br>
> > while not strictly a problem for ISL, causes incompatibility problems<br>
> > with the way the GL driver chooses surface pitches.<br>
> ><br>
> > The chosen method requires that you pay attention and not just assume that<br>
> > your tile_info is in the units you think it is. However, it's nice because<br>
> > it provides a nice "these are the units" declaration in isl_tile_info<br>
> > itself. Previously, the tile_info wasn't usable as a stand-alone structure<br>
> > because you had to also know the format. It also forces figuring out how<br>
> > to deal with inconsistencies between tiling and format back to the caller<br>
> > which is good because the two different consumers of isl_tile_info really<br>
> > want to deal with it differently: Computation of the surface size wants<br>
> > the fewest number of horizontal tiles possible while get_intratile_offset<br>
> > is far more concerned with things aligning nicely.<br>
> ><br>
> > Signed-off-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br>
> > Cc: Chad Versace <<a href="mailto:chad@kiwitree.net">chad@kiwitree.net</a>><br>
> > ---<br>
> > src/intel/isl/isl.c | 49 ++++++++++++++++++++++++++++++++++++-------------<br>
> > src/intel/isl/isl.h | 10 +++++++++-<br>
> > 2 files changed, 45 insertions(+), 14 deletions(-)<br>
> ><br>
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c<br>
> > index 34245f4..8c72186 100644<br>
> > --- a/src/intel/isl/isl.c<br>
> > +++ b/src/intel/isl/isl.c<br>
> > @@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev,<br>
> > const uint32_t bs = format_bpb / 8;<br>
> > struct isl_extent2d logical_el, phys_B;<br>
> ><br>
> > - assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb));<br>
> > + if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) {<br>
> -----------------------------------------^^^<br>
> 1. Weirdo space after '!'<br>
><br>
> > + /* It is possible to have non-power-of-two formats in a tiled buffer.<br>
> > + * The easiest way to handle this is to treat the tile as if it is three<br>
> > + * times as wide. This way no pixel will ever cross a tile boundary.<br>
> > + * This really only works on legacy X and Y tiling formats.<br>
> > + */<br>
> > + assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0);<br>
> > + assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3));<br>
> > + return isl_tiling_get_info(dev, tiling, format_bpb / 3, tile_info);<br>
> > + }<br>
> ><br>
> > switch (tiling) {<br>
> > case ISL_TILING_LINEAR:<br>
> > @@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev,<br>
> ><br>
> > *tile_info = (struct isl_tile_info) {<br>
> > .tiling = tiling,<br>
> > + .format_bpb = format_bpb,<br>
> > .logical_extent_el = logical_el,<br>
> > .phys_extent_B = phys_B,<br>
> > };<br>
> > @@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev,<br>
> > }<br>
> > base_alignment = isl_round_up_to_power_of_two(base_alignment);<br>
> > } else {<br>
> > + assert(fmtl->bpb % tile_info.format_bpb == 0);<br>
> > + const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb;<br>
><br>
> > + assert(tile_el_scale == 1 || tile_el_scale == 3);<br>
><br>
> 2. Without the context of this patch, the assertion is very mysterious.<br>
> Future people will be afraid to touch it, as there's no clue to the<br>
> magic 3's justification. Please add a comment.</p>
<p dir="ltr">Yes, it can be dropped. At one point I was trying harder to be clever but the code below doesn't care how big tile_el_scale is.</p>
<p dir="ltr">> > +<br>
> > assert(phys_slice0_sa.w % fmtl->bw == 0);<br>
> > const uint32_t total_w_el = phys_slice0_sa.width / fmtl->bw;<br>
> > const uint32_t total_w_tl =<br>
> > - isl_align_div(total_w_el, tile_info.logical_extent_el.width);<br>
> > + isl_align_div(total_w_el * tile_el_scale,<br>
> > + tile_info.logical_extent_el.width);<br>
> ><br>
> > row_pitch = total_w_tl * tile_info.phys_extent_B.width;<br>
> > if (row_pitch < info->min_pitch) {<br>
> > - row_pitch = isl_align(info->min_pitch, tile_info.phys_extent_B.width);<br>
> > + row_pitch = isl_align_npot(info->min_pitch,<br>
> > + tile_info.phys_extent_B.width);<br>
> > }<br>
> ><br>
> > total_h_el += isl_align_div_npot(pad_bytes, row_pitch);<br>
> > @@ -1678,14 +1694,6 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev,<br>
> > uint32_t *x_offset_el,<br>
> > uint32_t *y_offset_el)<br>
> > {<br>
> > - /* This function only really works for power-of-two surfaces. In<br>
> > - * theory, we could make it work for non-power-of-two surfaces by going<br>
> > - * to the left until we find a block that is bs-aligned. The Vulkan<br>
> > - * driver doesn't use non-power-of-two tiled surfaces so we'll leave<br>
> > - * this unimplemented for now.<br>
> > - */<br>
> > - assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(bs));<br>
> > -<br>
> > if (tiling == ISL_TILING_LINEAR) {<br>
> > *base_address_offset = total_y_offset_el * row_pitch +<br>
> > total_x_offset_el * bs;<br>
> > @@ -1694,8 +1702,24 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev,<br>
> > return;<br>
> > }<br>
> ><br>
> > + const uint32_t bpb = bs * 8;<br>
> > +<br>
> > struct isl_tile_info tile_info;<br>
> > - isl_tiling_get_info(dev, tiling, bs * 8, &tile_info);<br>
> > + isl_tiling_get_info(dev, tiling, bpb, &tile_info);<br>
> > +<br>
> > + assert(row_pitch % tile_info.phys_extent_B.width == 0);<br>
> > +<br>
> > + /* For non-power-of-two formats, we need the address to be both tile and<br>
> > + * element-aligned. The easiest way to achieve this is to work with a tile<br>
> > + * that is three times as wide as the regular tile.<br>
> > + *<br>
> > + * The tile info returned by get_tile_info has a logical size that is an<br>
> > + * integer number of tile_info.format_bpb size elements. To scale the<br>
> > + * tile, we scale up the physical width and then treat the logical tile<br>
> > + * size as if it has bpb size elements.<br>
> > + */<br>
> > + const uint32_t tile_el_scale = bpb / tile_info.format_bpb;<br>
> > + tile_info.phys_extent_B.width *= tile_el_scale;<br>
> ><br>
> > /* Compute the offset into the tile */<br>
> > *x_offset_el = total_x_offset_el % tile_info.logical_extent_el.w;<br>
> > @@ -1705,7 +1729,6 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev,<br>
> > uint32_t x_offset_tl = total_x_offset_el / tile_info.logical_extent_el.w;<br>
> > uint32_t y_offset_tl = total_y_offset_el / tile_info.logical_extent_el.h;<br>
> ><br>
> > - assert(row_pitch % tile_info.phys_extent_B.width == 0);<br>
> > *base_address_offset =<br>
> > y_offset_tl * tile_info.phys_extent_B.h * row_pitch +<br>
> > x_offset_tl * tile_info.phys_extent_B.h * tile_info.phys_extent_B.w;<br>
> > diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h<br>
> > index 36ef81c..92a694a 100644<br>
> > --- a/src/intel/isl/isl.h<br>
> > +++ b/src/intel/isl/isl.h<br>
> > @@ -728,7 +728,15 @@ struct isl_format_layout {<br>
> > struct isl_tile_info {<br>
> > enum isl_tiling tiling;<br>
> ><br>
> > - /** The logical size of the tile in units of surface elements<br>
> > + /* The size (in bits per block) of a single surface element<br>
> > + *<br>
> > + * For surfaces with power-of-two formats, this is the same as<br>
> > + * isl_format_layout::bpb. For non-power-of-two formats it may be smaller.<br>
> > + * The logical_extent_el field is in terms of elements of this size.<br>
><br>
> 3. It would be very nice to insert an example here for<br>
> ISL_FORMAT_R32G32B32_FLOAT, to help the future people that will<br>
> inevitably be confused by this surprising resizing. Here's a suggestion:<br>
><br>
> For example, consider ISL_FORMAT_R32G32B32_FLOAT, for which<br>
> isl_format_layout::bpb is 96 (a non-power-of-two). When calculating<br>
> surface layout, isl wants to work only with power-of-two bpb. So isl<br>
> splits the pixel into 3 surface elements, one for each channel, and<br>
> isl_tile_info::format_bpb is 32.</p>
<p dir="ltr">I'm not sure I like exactly what you said there but I will definitely cook up a comment. More comments rarely hurt.</p>
<p dir="ltr">> > + */<br>
> > + uint32_t format_bpb;<br>
> > +<br>
> > + /** The logical size of the tile in units of format_bpb size elements<br>
> > *<br>
> > * This field determines how a given surface is cut up into tiles. It is<br>
> > * used to compute the size of a surface in tiles and can be used to<br>
><br>
> Address feedback #1 and #2, and this gets<br>
> Acked-by: Chad Versace <<a href="mailto:chadversary@chromium.org">chadversary@chromium.org</a>><br>
><br>
> Just an ack, because I didn't comb through all of isl to verify that<br>
> this was safe.</p>
<p dir="ltr">I did comb through enough to know that the two places I modified ate the only two places that we actually use the tie width. They're is one other place that uses the tile size only uses the height.</p>
<p dir="ltr">> Address feedback #3 and you get bonus points.<br></p>