<p dir="ltr"></p>
<p dir="ltr">On Nov 4, 2016 2:30 AM, "Pohjolainen, Topi" <<a href="mailto:topi.pohjolainen@gmail.com">topi.pohjolainen@gmail.com</a>> wrote:<br>
><br>
> On Fri, Nov 04, 2016 at 11:17:13AM +0200, Pohjolainen, Topi wrote:<br>
> > On Fri, Oct 28, 2016 at 02:17:12AM -0700, Jason Ekstrand wrote:<br>
> > > Previously, blorp copy operations were CCS-unaware so you had to perform<br>
> > > resolves on the source and destination before performing the copy. This<br>
> > > commit makes blorp_copy capable of handling CCS-compressed images without<br>
> > > any resolves.<br>
> > ><br>
> > > Signed-off-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br>
> > > ---<br>
> > > src/intel/blorp/blorp_blit.c | 173 ++++++++++++++++++++++++++++++++++++++++++-<br>
> > > src/intel/blorp/blorp_priv.h | 6 ++<br>
> > > 2 files changed, 177 insertions(+), 2 deletions(-)<br>
> > ><br>
> > > diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c<br>
> > > index 07bb181..598b7c9 100644<br>
> > > --- a/src/intel/blorp/blorp_blit.c<br>
> > > +++ b/src/intel/blorp/blorp_blit.c<br>
> > > @@ -851,6 +851,66 @@ blorp_nir_manual_blend_bilinear(nir_builder *b, nir_ssa_def *pos,<br>
> > > frac_y);<br>
> > > }<br>
> > ><br>
> > > +/** Perform a color bit-cast operation<br>
> > > + *<br>
> > > + * For copy operations involving CCS, we may need to use different formats for<br>
> > > + * the source and destination surfaces. The two formats must both be UINT<br>
> > > + * formats and must have the same size but may have different bit layouts.<br>
> > > + * For instance, we may be copying from R8G8B8A8_UINT to R32_UINT or R32_UINT<br>
> > > + * to R16G16_UINT. This function generates code to shuffle bits around to get<br>
> > > + * us from one to the other.<br>
> > > + */<br>
> > > +static nir_ssa_def *<br>
> > > +bit_cast_color(struct nir_builder *b, nir_ssa_def *color,<br>
> > > + const struct brw_blorp_blit_prog_key *key)<br>
> > > +{<br>
> > > + assert(key->texture_data_type == nir_type_uint);<br>
> > > +<br>
> > > + if (key->dst_bpc > key->src_bpc) {<br>
> > > + nir_ssa_def *u = nir_ssa_undef(b, 1, 32);<br>
> > > + nir_ssa_def *dst_chan[2] = { u, u };<br>
> > > + unsigned shift = 0;<br>
> > > + unsigned dst_idx = 0;<br>
> > > + for (unsigned i = 0; i < 4; i++) {<br>
> > > + nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, color, i),<br>
> > > + nir_imm_int(b, shift));<br>
> > > + if (shift == 0) {<br>
> > > + dst_chan[dst_idx] = shifted;<br>
> > > + } else {<br>
> > > + dst_chan[dst_idx] = nir_ior(b, dst_chan[dst_idx], shifted);<br>
> > > + }<br>
> > > +<br>
> > > + shift += key->src_bpc;<br>
> > > + if (shift >= key->dst_bpc) {<br>
> > > + dst_idx++;<br>
> > > + shift = 0;<br>
> > > + }<br>
> > > + }<br>
> ><br>
> > This iterates unconditionally over 4 source components regardless how many<br>
> > there are in source, right? Lets assume we have dst == R32_UINT and<br>
> > src == R16G16_UINT, don't we get the following where iterations with i = 2<br>
> > and i = 3 try to access non-existing z and w components:<br>
> ><br>
> > i = 0:<br>
> > nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, color, 0),<br>
> > nir_imm_int(b, 0));<br>
> > dst_chan[0] = shifted;<br>
> ><br>
> > i = 1:<br>
> > nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, color, 1),<br>
> > nir_imm_int(b, 16));<br>
> > dst_chan[0] = nir_ior(b, dst_chan[0], shifted);<br>
> ><br>
> > i = 2:<br>
> > nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, color, 2),<br>
> > nir_imm_int(b, 0));<br>
> > dst_chan[1] = shifted;<br>
> ><br>
> > i = 3:<br>
> > nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, color, 3),<br>
> > nir_imm_int(b, 16));<br>
> > dst_chan[1] = nir_ior(b, dst_chan[1], shifted);<br>
><br>
> If this is intentional and the idea is to rely on optimization passes to<br>
> remove unnecessary writes to "dst_chan[1]" then a comment would be nice.</p>
<p dir="ltr">It is intentional but instead of relying on any compiler magic, we're just relying on the fact that the texturing instruction always returns a vec4 and the render target write always takes a vec4. I could have included the number of components in the key and used that to not generate pointless conversion code in a few places but that would have ended up making it more complicated and we would end up generating more distinct shader programs for only marginal benefit.<br>
</p>