[Mesa-dev] [PATCH 27/59] i965/fs: add a stride helper

Iago Toral itoral at igalia.com
Thu May 5 07:15:59 UTC 2016


On Wed, 2016-05-04 at 13:59 -0700, Francisco Jerez wrote:
> Iago Toral <itoral at igalia.com> writes:
> 
> > On Wed, 2016-05-04 at 01:15 -0700, Francisco Jerez wrote:
> >> Iago Toral <itoral at igalia.com> writes:
> >> 
> >> > On Mon, 2016-05-02 at 18:48 -0700, Francisco Jerez wrote:
> >> >> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> >> >> 
> >> >> > From: Connor Abbott <connor.w.abbott at intel.com>
> >> >> >
> >> >> > Similar to retype() and offset().
> >> >> > ---
> >> >> >  src/mesa/drivers/dri/i965/brw_ir_fs.h | 8 ++++++++
> >> >> >  1 file changed, 8 insertions(+)
> >> >> >
> >> >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> >> >> > index e4f20f4..abda2c3 100644
> >> >> > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
> >> >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> >> >> > @@ -78,6 +78,14 @@ retype(fs_reg reg, enum brw_reg_type type)
> >> >> >  }
> >> >> >  
> >> >> >  static inline fs_reg
> >> >> > +stride(fs_reg reg, unsigned stride)
> >> >> > +{
> >> >> > +   if (reg.stride != 0)
> >> >> > +      reg.stride = stride;
> >> >> > +   return reg;
> >> >> > +}
> >> >> > +
> >> >> 
> >> >> This only works if reg.stride == 0 or 1, we need to honour the stride of
> >> >> the original register (e.g. by doing reg.stride *= stride) or you'll end
> >> >> up taking components not part of the region given as argument.  It gets
> >> >> messy with ARF and HW registers because they represent the stride
> >> >> differently...  I suggest that instead of fixing this you take the
> >> >> following patch from my SIMD32 branch that introduces a somewhat easier
> >> >> to use helper: Instead of doing 'stride(horiz_offset(retype(reg, t), i),
> >> >> type_sz(reg.type) / type_sz(t))' to take the i-th subcomponent of type t
> >> >> of the original register you would just do 'subscript(reg, t, i)'. 
> >> >
> >> > Actually, I think this is not what we want in a number of places.
> >> >
> >> > Sometimes we just want to apply a stride to a register, no type changes
> >> > or anything. That is, we just want to do something like stride(reg, 2).
> >> > Of course, we can code that with subscript as subscript(reg, retype(reg,
> >> > type_that is_half_the_size), 0) but that is certainly forced. For
> >> > example, we have code such as:
> >> >
> >> > bld.MOV(tmp, stride(component_i, 2));
> >> > bld.MOV(horiz_offset(tmp, 8 * multiplier),
> >> >         stride(horiz_offset(component_i, 1), 2));
> >> >
> >> Oh no, this is precisely the use-case I had in mind: component_i is a
> >> vector of 64-bit channels laid out contiguously, and you want to extract
> >> each 32-bit field from each channel separately.  You happen to have
> >> declared component_i as a 32-bit type but that's kind of a lie because
> >> each component of "component_i" only represents half a channel, it would
> >> definitely make more sense to make it a DF because that's what it
> >> contains...
> >
> > I think that whether component_i is really 64-bit or 32-bit is not
> > relevant, in the sense that what matters is how the algorithm
> > needs/wants to interpret the data it has to manipulate. The way I
> > thought about this is that we have our 32-bit channels mixed up and we
> > need to re-arrange them, so from my perspective, this is all about
> > 32-bit data that we need to move around in a particular fashion.
> 
> Keep in mind that channel masking is applied to component_i in multiples
> of 64 bits regardless of the type you declare it with, so if you forget
> the fact that component_i is a vector of elements 64 bits apart from
> each other and pretend it's a vector of 32 bit channels, you'll end up
> crossing channel boundaries illegally and miscompiling the program (or
> working it around with ->force_writemask_all hacks like you do in the
> SHUFFLE functions...). 

Yes, that is true.

>  subscript() guarantees that you never have to
> break channel boundaries not even temporarily -- If the register you
> give it as argument was channel-aligned the result is guaranteed
> channel-aligned.  stride() OTOH gives you a channel-misaligned result
> you need to compensate for with a matching retype(), so pretty much
> every use of the result from stride() other than as argument for
> retype() indicates a bug, that's why it makes sense to do both things in
> the same operation.  Otherwise you need to deal with a temporary value
> which is basically meaningless, and you need to manually make sure that
> the value provided as stride multiplier matches the ratio of the type
> sizes exactly (the snippet you paste below from PACK lowering
> illustrates the problem since it would miscompile the program silently
> if the instruction didn't have the exact number of sources you
> expected).

Ok, I see your point, that sounds like a good argument :)

> > That's why I find the stride() helper a lot more natural to use, where
> > I just tell the access type and the stride explicitly, while with
> > subscript() I have to think about a register type and an access type
> > instead and the stride is a consequence of that relation. I admit this
> > is very subjective though.
> >
> Sorry, but this wasn't just a codestyle suggestion, most of the uses of
> stride() and horiz_offset() you have in both series will be broken for a
> certain input of fully well-formed IR due to at least one out of three
> different reasons (horiz_offset() not being intended (or able except by
> luck) to move you out of the original SIMD vector, the argument of
> stride() not being guaranteed to match the intended ratio of types
> exactly and your stride() implementation being broken for all input
> strides except 0 or 1).
> 
> > In any case, let's change it, as you say we should be able to use
> > subscript with some minor changes so let's do that. I just wanted to
> > explain why I personally think that stride() might be more
> > flexible/natural in some scenarios, at least for some people like me :)
> >
> I agree that stride() is more flexible -- Almost too flexible.
> subscript() is precisely about distilling off the subset of
> functionality from stride() that you can use safely without breaking
> channel boundaries -- Not saying that stride() couldn't conceivably be
> useful for some more exotic purpose, but I've grepped your whole branch
> and I haven't found a single instance which wasn't basically an
> open-coded and arguably buggy version of subscript().  My SIMD32 branch
> makes more intensive use of subscript() (even for things which are not
> per-channel as you can guess from the ARF/HW REG handling), and I
> haven't yet found a case where the extra flexibility of stride() would
> be necessary or useful.

There is the case where we convert from DF to F which I think is a bit
special. There we have this code:

      fs_reg temp = ibld.vgrf(inst->dst.type, 2);
      ibld.MOV(stride(temp, 2), inst->src[0]);
      ibld.MOV(dst, stride(temp, 2));

Basically, what is going on here is that the first MOV does the
conversion but it writes elements of 64-bit anyway where the hi 32-bit
of each element is trash so we need the second MOV to re-combine the low
32-bit regions to produce a proper 32-bit result.

With subscript we need to play around with the types we use everywhere
to achieve the same thing. I think the resulting code is a bit more
confusing:

      fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
      fs_reg strided_temp =
         retype(subscript(temp, inst->dst.type, 0), inst->dst.type);
      ibld.MOV(strided_temp, inst->src[0]);
      ibld.MOV(dst, strided_temp);

The issue here is that we need to do the allocation of temp with the
larger type so we can then subscript into it with the smaller type, but
then we really need to write to and read from temp using the smaller
type, so we need to retype anyway.

I don't think this alone justifies that we keep the stride helper
though, just pointing out a case where using subscript is a bit more
forced IMHO.

> > For the record, Ken suggested we land things as they are now (with
> > horiz_offset/stride) and fix these things later to use subscript()
> > instead, so we will do that.
> >
> >> > where component_i is of type F. Of course, we could retype component_i
> >> > to DF when we pass it to subscript, and use type F for the type
> >> > parameter, so that subscript does what we want, but it is certainly not
> >> > very natural. Subscript is all about accessing data with a different
> >> > type and we have some places where we want to do that and subscript does
> >> > what we want, but we also want to use a "normal" stride sometimes.
> >> >
> >> > Another example, in lowering of PACK, we do something like this:
> >> >
> >> > ibld.MOV(stride(horiz_offset(retype(dst, inst->src[i].type), i), 
> >> >                 inst->sources),
> >> >          inst->src[i]);
> >> >
> >> Same here.  If inst->sources hadn't matched the ratio between the two
> >> types you'd have ended up reading cross channel boundaries and
> >> miscompiling the program.  Using subscript() instead would guarantee
> >> that this is never the case even if e.g. one of the sources happens to
> >> be missing.
> >> 
> >> >
> >> > Here the stride we want is not based on any relation between types (even
> >> > if I think in this case it would work because inst->sources will be 2
> >> > and the type of inst->src[i] will be half of the type of the dst).
> >> >
> >> > I think that even if subscript can be useful in some of our cases, we
> >> > still want the stride helper for a bunch of other cases.
> >> >
> >> >>  I
> >> >> think I've looked through all the uses of stride() in your branch and
> >> >> they are all sort of an open-coded version of subscript().  This will
> >> >> also address the issue Ken pointed out in patch 29 related to the use of
> >> >> horiz_offset() on uniforms (or anything with stride other than one,
> >> >> really) -- See the attached patch (yes, correct ARF/HW reg handling will
> >> >> be required for SIMD32...).
> >> >> 
> >> >> > +static inline fs_reg
> >> >> >  byte_offset(fs_reg reg, unsigned delta)
> >> >> >  {
> >> >> >     switch (reg.file) {
> >> >> > -- 
> >> >> > 2.5.0
> >> >> >
> >> >> > _______________________________________________
> >> >> > mesa-dev mailing list
> >> >> > mesa-dev at lists.freedesktop.org
> >> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >> >> 
> >> >> _______________________________________________
> >> >> mesa-dev mailing list
> >> >> mesa-dev at lists.freedesktop.org
> >> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev




More information about the mesa-dev mailing list