[Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg

Francisco Jerez currojerez at riseup.net
Wed Sep 14 05:12:33 UTC 2016


Iago Toral <itoral at igalia.com> writes:

> On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote:
>> Iago Toral Quiroga <itoral at igalia.com> writes:
>> 
>> > 
>> > We will use this in cases where we want to force the vstride of a
>> > src_reg
>> > to 0 to exploit a particular behavior of the hardware. It will come
>> > in
>> > handy to implement access to components Z/W.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 +
>> >  src/mesa/drivers/dri/i965/brw_vec4.cpp  | 2 ++
>> >  2 files changed, 3 insertions(+)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
>> > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
>> > index f66c093..f3cce4b 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
>> > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
>> > @@ -51,6 +51,7 @@ public:
>> >     explicit src_reg(const dst_reg &reg);
>> >  
>> >     src_reg *reladdr;
>> > +   bool force_vstride0;
>> I was wondering whether it would make more sense to unify this with
>> the
>> FS back-end's fs_reg::stride (a numeric stride field is also likely
>> more
>> convenient to do arithmetic on than a boolean) and promote it to
>> backend_reg?  It could be defined as the number of components to jump
>> over for each logical channel of the register, which is just the
>> vstride
>> in single-precision SIMD4x2 and the hstride in scalar mode.
>
> We could do that, but I thought it would be a good idea to make it
> clear that here we are using the vstride=0 with a very specific
> intention and we don't expect the hardware to do what it would be
> expected (we are trying to exploit a hardware bug after all). If we
> were to use a normal stride field for this I think we would make this
> intention much less obvious and other people reading the code would
> have a much harder time understanding what is really going on. Since we
> are being tricky here I think the extra field to signal that we are
> trying to do something "special" might be worth it: people can track
> where we read and write that field and see exactly where it is being
> used for the purpose of exploiting this particular hardware behavior.
>

Yes, I agree that the hardware's behavior on Gen7 with non-identity
vstride is tricky and special -- Special enough that *none* of the VEC4
optimization passes and IR-handling code need to be aware of it, because
the field is only going to be used as internal book-keeping data
structure in convert_to_hw_regs() and immediately discarded.  IOW you're
storing an internal data structure of convert_to_hw_regs() as part of
the shared IR data structure, with no well-defined semantics and which
no back-end code (not even convert_to_hw_regs()) is going to be able to
honor.

So if your argument for making the representation of vstride
unnecessarily non-orthogonal is that you want to discourage people from
using it at the IR level (which is fair because it won't work at all!),
I would argue that it doesn't belong in the IR data structures in the
first place, because you could just keep convert_to_hw_regs' internal
data structures internal to convert_to_hw_regs.  (I don't actually think
you need the data structure, neither internal nor external, but more on
that later)

>> But thinking about it some more, I wonder if it's really necessary to
>> expose vertical strides at the IR level?  Aren't you planing to use
>> this
>> during the conversion to HW registers exclusively?  Why don't you set
>> the vstride field directly in that case?
>
> Yes, this is used exclusively at that time. The conversion to hardware
> registers in convert_to_hw_regs() happens in two stages now:
>
> We call our 'expand_64bit_swizzle_to_32bit()' helper first. This one
> takes care of checking the regioning on DF instructions, translate
> swizzles and set force_vstrid0 to true when needed (which is also the
> only place that would set this to true). Then the rest of the code in
> convert_to_hw_regs() just operates as usual, only that it will check
> the force_vstride0 setting to decide the vstride to use for DF regions.
>
> I did it like this because it allows us to keep the DF swizzle
> translation and regioning checking logic separated from the conversion
> to hardware registers, but this separation means that we need to tell
> the latter when it has to set the vstride to 0, thus the addition of
> the forcE_vstride0 field. I think having these two things separated
> makes sense and makes the code easier to read. We can keep both things
> separate and still avoid the force_vstride0 field by using a stride
> field as you suggest above, but as I said, I think we might be doing a
> rather tricky thing a bit less obvious than it should to other people.
>

Keeping these two tasks logically separate from each other sounds fine
to me, but you don't need to extend the IR for them to exchange data.
AFAICT expand_64bit_swizzle_to_32bit() is doing two things:

 - Calculate the hardware swizzle, which potentially involves an
   adjustment of the subregister offset -- These two are uniquely
   determined by the original src_reg swizzle, type and offset.

 - Calculate the hardware vstride -- This is also uniquely determined by
   the original src_reg swizzle.

IOW, you could implement this easily (and locally) as a plain function
you could call from convert_to_hw_regs(), e.g.:

| static brw_reg
| apply_logical_swizzle(brw_reg reg, unsigned swz);

that would take a hardware register and a lowered logical swizzle
(i.e. a component-based one), and give you the same register as result
after performing any adjustments on 'reg' needed to rearrange the
logical components of 'reg' as specified by 'swz'.

This would make the IR changes and the whole translation of
force_vstride0 back to vstride currently done in convert_to_hw_regs
unnecessary.

>> > 
>> >  };
>> >  
>> >  static inline src_reg
>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > index a20b2fd..bfbbd96 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > @@ -70,6 +70,7 @@ src_reg::src_reg(struct ::brw_reg reg) :
>> >  {
>> >     this->reg_offset = 0;
>> >     this->reladdr = NULL;
>> > +   this->force_vstride0 = false;
>> >  }
>> >  
>> >  src_reg::src_reg(const dst_reg &reg) :
>> > @@ -77,6 +78,7 @@ src_reg::src_reg(const dst_reg &reg) :
>> >  {
>> >     this->reladdr = reg.reladdr;
>> >     this->swizzle = brw_swizzle_for_mask(reg.writemask);
>> > +   this->force_vstride0 = false;
>> >  }
>> >  
>> >  void
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160913/b541ecb8/attachment.sig>


More information about the mesa-dev mailing list