[Mesa-dev] [PATCH 27/59] i965/fs: add a stride helper

Francisco Jerez currojerez at riseup.net
Wed May 4 20:59:33 UTC 2016


Iago Toral <itoral at igalia.com> writes:

> On Wed, 2016-05-04 at 01:15 -0700, Francisco Jerez wrote:
>> Iago Toral <itoral at igalia.com> writes:
>> 
>> > On Mon, 2016-05-02 at 18:48 -0700, Francisco Jerez wrote:
>> >> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> >> 
>> >> > From: Connor Abbott <connor.w.abbott at intel.com>
>> >> >
>> >> > Similar to retype() and offset().
>> >> > ---
>> >> >  src/mesa/drivers/dri/i965/brw_ir_fs.h | 8 ++++++++
>> >> >  1 file changed, 8 insertions(+)
>> >> >
>> >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> >> > index e4f20f4..abda2c3 100644
>> >> > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> >> > @@ -78,6 +78,14 @@ retype(fs_reg reg, enum brw_reg_type type)
>> >> >  }
>> >> >  
>> >> >  static inline fs_reg
>> >> > +stride(fs_reg reg, unsigned stride)
>> >> > +{
>> >> > +   if (reg.stride != 0)
>> >> > +      reg.stride = stride;
>> >> > +   return reg;
>> >> > +}
>> >> > +
>> >> 
>> >> This only works if reg.stride == 0 or 1, we need to honour the stride of
>> >> the original register (e.g. by doing reg.stride *= stride) or you'll end
>> >> up taking components not part of the region given as argument.  It gets
>> >> messy with ARF and HW registers because they represent the stride
>> >> differently...  I suggest that instead of fixing this you take the
>> >> following patch from my SIMD32 branch that introduces a somewhat easier
>> >> to use helper: Instead of doing 'stride(horiz_offset(retype(reg, t), i),
>> >> type_sz(reg.type) / type_sz(t))' to take the i-th subcomponent of type t
>> >> of the original register you would just do 'subscript(reg, t, i)'. 
>> >
>> > Actually, I think this is not what we want in a number of places.
>> >
>> > Sometimes we just want to apply a stride to a register, no type changes
>> > or anything. That is, we just want to do something like stride(reg, 2).
>> > Of course, we can code that with subscript as subscript(reg, retype(reg,
>> > type_that is_half_the_size), 0) but that is certainly forced. For
>> > example, we have code such as:
>> >
>> > bld.MOV(tmp, stride(component_i, 2));
>> > bld.MOV(horiz_offset(tmp, 8 * multiplier),
>> >         stride(horiz_offset(component_i, 1), 2));
>> >
>> Oh no, this is precisely the use-case I had in mind: component_i is a
>> vector of 64-bit channels laid out contiguously, and you want to extract
>> each 32-bit field from each channel separately.  You happen to have
>> declared component_i as a 32-bit type but that's kind of a lie because
>> each component of "component_i" only represents half a channel, it would
>> definitely make more sense to make it a DF because that's what it
>> contains...
>
> I think that whether component_i is really 64-bit or 32-bit is not
> relevant, in the sense that what matters is how the algorithm
> needs/wants to interpret the data it has to manipulate. The way I
> thought about this is that we have our 32-bit channels mixed up and we
> need to re-arrange them, so from my perspective, this is all about
> 32-bit data that we need to move around in a particular fashion.

Keep in mind that channel masking is applied to component_i in multiples
of 64 bits regardless of the type you declare it with, so if you forget
the fact that component_i is a vector of elements 64 bits apart from
each other and pretend it's a vector of 32 bit channels, you'll end up
crossing channel boundaries illegally and miscompiling the program (or
working it around with ->force_writemask_all hacks like you do in the
SHUFFLE functions...).  subscript() guarantees that you never have to
break channel boundaries not even temporarily -- If the register you
give it as argument was channel-aligned the result is guaranteed
channel-aligned.  stride() OTOH gives you a channel-misaligned result
you need to compensate for with a matching retype(), so pretty much
every use of the result from stride() other than as argument for
retype() indicates a bug, that's why it makes sense to do both things in
the same operation.  Otherwise you need to deal with a temporary value
which is basically meaningless, and you need to manually make sure that
the value provided as stride multiplier matches the ratio of the type
sizes exactly (the snippet you paste below from PACK lowering
illustrates the problem since it would miscompile the program silently
if the instruction didn't have the exact number of sources you
expected).

> That's why I find the stride() helper a lot more natural to use, where
> I just tell the access type and the stride explicitly, while with
> subscript() I have to think about a register type and an access type
> instead and the stride is a consequence of that relation. I admit this
> is very subjective though.
>
Sorry, but this wasn't just a codestyle suggestion, most of the uses of
stride() and horiz_offset() you have in both series will be broken for a
certain input of fully well-formed IR due to at least one out of three
different reasons (horiz_offset() not being intended (or able except by
luck) to move you out of the original SIMD vector, the argument of
stride() not being guaranteed to match the intended ratio of types
exactly and your stride() implementation being broken for all input
strides except 0 or 1).

> In any case, let's change it, as you say we should be able to use
> subscript with some minor changes so let's do that. I just wanted to
> explain why I personally think that stride() might be more
> flexible/natural in some scenarios, at least for some people like me :)
>
I agree that stride() is more flexible -- Almost too flexible.
subscript() is precisely about distilling off the subset of
functionality from stride() that you can use safely without breaking
channel boundaries -- Not saying that stride() couldn't conceivably be
useful for some more exotic purpose, but I've grepped your whole branch
and I haven't found a single instance which wasn't basically an
open-coded and arguably buggy version of subscript().  My SIMD32 branch
makes more intensive use of subscript() (even for things which are not
per-channel as you can guess from the ARF/HW REG handling), and I
haven't yet found a case where the extra flexibility of stride() would
be necessary or useful.

> For the record, Ken suggested we land things as they are now (with
> horiz_offset/stride) and fix these things later to use subscript()
> instead, so we will do that.
>
>> > where component_i is of type F. Of course, we could retype component_i
>> > to DF when we pass it to subscript, and use type F for the type
>> > parameter, so that subscript does what we want, but it is certainly not
>> > very natural. Subscript is all about accessing data with a different
>> > type and we have some places where we want to do that and subscript does
>> > what we want, but we also want to use a "normal" stride sometimes.
>> >
>> > Another example, in lowering of PACK, we do something like this:
>> >
>> > ibld.MOV(stride(horiz_offset(retype(dst, inst->src[i].type), i), 
>> >                 inst->sources),
>> >          inst->src[i]);
>> >
>> Same here.  If inst->sources hadn't matched the ratio between the two
>> types you'd have ended up reading cross channel boundaries and
>> miscompiling the program.  Using subscript() instead would guarantee
>> that this is never the case even if e.g. one of the sources happens to
>> be missing.
>> 
>> >
>> > Here the stride we want is not based on any relation between types (even
>> > if I think in this case it would work because inst->sources will be 2
>> > and the type of inst->src[i] will be half of the type of the dst).
>> >
>> > I think that even if subscript can be useful in some of our cases, we
>> > still want the stride helper for a bunch of other cases.
>> >
>> >>  I
>> >> think I've looked through all the uses of stride() in your branch and
>> >> they are all sort of an open-coded version of subscript().  This will
>> >> also address the issue Ken pointed out in patch 29 related to the use of
>> >> horiz_offset() on uniforms (or anything with stride other than one,
>> >> really) -- See the attached patch (yes, correct ARF/HW reg handling will
>> >> be required for SIMD32...).
>> >> 
>> >> > +static inline fs_reg
>> >> >  byte_offset(fs_reg reg, unsigned delta)
>> >> >  {
>> >> >     switch (reg.file) {
>> >> > -- 
>> >> > 2.5.0
>> >> >
>> >> > _______________________________________________
>> >> > mesa-dev mailing list
>> >> > mesa-dev at lists.freedesktop.org
>> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> >> 
>> >> _______________________________________________
>> >> mesa-dev mailing list
>> >> mesa-dev at lists.freedesktop.org
>> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160504/4c981bef/attachment.sig>


More information about the mesa-dev mailing list