[Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg

Iago Toral itoral at igalia.com
Wed Sep 14 06:19:30 UTC 2016


On Tue, 2016-09-13 at 22:12 -0700, Francisco Jerez wrote:
> Iago Toral <itoral at igalia.com> writes:
> 
> > 
> > On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote:
> > > 
> > > Iago Toral Quiroga <itoral at igalia.com> writes:
> > > 
> > > > 
> > > > 
> > > > We will use this in cases where we want to force the vstride of
> > > > a
> > > > src_reg
> > > > to 0 to exploit a particular behavior of the hardware. It will
> > > > come
> > > > in
> > > > handy to implement access to components Z/W.
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 +
> > > >  src/mesa/drivers/dri/i965/brw_vec4.cpp  | 2 ++
> > > >  2 files changed, 3 insertions(+)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > index f66c093..f3cce4b 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > @@ -51,6 +51,7 @@ public:
> > > >     explicit src_reg(const dst_reg &reg);
> > > >  
> > > >     src_reg *reladdr;
> > > > +   bool force_vstride0;
> > > I was wondering whether it would make more sense to unify this
> > > with
> > > the
> > > FS back-end's fs_reg::stride (a numeric stride field is also
> > > likely
> > > more
> > > convenient to do arithmetic on than a boolean) and promote it to
> > > backend_reg?  It could be defined as the number of components to
> > > jump
> > > over for each logical channel of the register, which is just the
> > > vstride
> > > in single-precision SIMD4x2 and the hstride in scalar mode.
> > We could do that, but I thought it would be a good idea to make it
> > clear that here we are using the vstride=0 with a very specific
> > intention and we don't expect the hardware to do what it would be
> > expected (we are trying to exploit a hardware bug after all). If we
> > were to use a normal stride field for this I think we would make
> > this
> > intention much less obvious and other people reading the code would
> > have a much harder time understanding what is really going on.
> > Since we
> > are being tricky here I think the extra field to signal that we are
> > trying to do something "special" might be worth it: people can
> > track
> > where we read and write that field and see exactly where it is
> > being
> > used for the purpose of exploiting this particular hardware
> > behavior.
> > 
> Yes, I agree that the hardware's behavior on Gen7 with non-identity
> vstride is tricky and special -- Special enough that *none* of the
> VEC4
> optimization passes and IR-handling code need to be aware of it,
> because
> the field is only going to be used as internal book-keeping data
> structure in convert_to_hw_regs() and immediately discarded.  IOW
> you're
> storing an internal data structure of convert_to_hw_regs() as part of
> the shared IR data structure, with no well-defined semantics and
> which
> no back-end code (not even convert_to_hw_regs()) is going to be able
> to
> honor.
> 
> So if your argument for making the representation of vstride
> unnecessarily non-orthogonal is that you want to discourage people
> from
> using it at the IR level (which is fair because it won't work at
> all!),
> I would argue that it doesn't belong in the IR data structures in the
> first place, because you could just keep convert_to_hw_regs' internal
> data structures internal to convert_to_hw_regs.  (I don't actually
> think
> you need the data structure, neither internal nor external, but more
> on
> that later)

Yes, that makes sense.

> > 
> > > 
> > > But thinking about it some more, I wonder if it's really
> > > necessary to
> > > expose vertical strides at the IR level?  Aren't you planing to
> > > use
> > > this
> > > during the conversion to HW registers exclusively?  Why don't you
> > > set
> > > the vstride field directly in that case?
> > Yes, this is used exclusively at that time. The conversion to
> > hardware
> > registers in convert_to_hw_regs() happens in two stages now:
> > 
> > We call our 'expand_64bit_swizzle_to_32bit()' helper first. This
> > one
> > takes care of checking the regioning on DF instructions, translate
> > swizzles and set force_vstrid0 to true when needed (which is also
> > the
> > only place that would set this to true). Then the rest of the code
> > in
> > convert_to_hw_regs() just operates as usual, only that it will
> > check
> > the force_vstride0 setting to decide the vstride to use for DF
> > regions.
> > 
> > I did it like this because it allows us to keep the DF swizzle
> > translation and regioning checking logic separated from the
> > conversion
> > to hardware registers, but this separation means that we need to
> > tell
> > the latter when it has to set the vstride to 0, thus the addition
> > of
> > the forcE_vstride0 field. I think having these two things separated
> > makes sense and makes the code easier to read. We can keep both
> > things
> > separate and still avoid the force_vstride0 field by using a stride
> > field as you suggest above, but as I said, I think we might be
> > doing a
> > rather tricky thing a bit less obvious than it should to other
> > people.
> > 
> Keeping these two tasks logically separate from each other sounds
> fine
> to me, but you don't need to extend the IR for them to exchange data.
> AFAICT expand_64bit_swizzle_to_32bit() is doing two things:
> 
>  - Calculate the hardware swizzle, which potentially involves an
>    adjustment of the subregister offset -- These two are uniquely
>    determined by the original src_reg swizzle, type and offset.
> 
>  - Calculate the hardware vstride -- This is also uniquely determined
> by
>    the original src_reg swizzle.
> 
> IOW, you could implement this easily (and locally) as a plain
> function
> you could call from convert_to_hw_regs(), e.g.:
> 
> > 
> > static brw_reg
> > apply_logical_swizzle(brw_reg reg, unsigned swz);
> that would take a hardware register and a lowered logical swizzle
> (i.e. a component-based one), and give you the same register as
> result
> after performing any adjustments on 'reg' needed to rearrange the
> logical components of 'reg' as specified by 'swz'.
> 
> This would make the IR changes and the whole translation of
> force_vstride0 back to vstride currently done in convert_to_hw_regs
> unnecessary.

Ok, so your suggestion is to do the translation to a hardware register
first and then fix the swizzle by calling a helper function. Yes, this
sounds like a better idea, thanks!

> > 
> > > 
> > > > 
> > > > 
> > > >  };
> > > >  
> > > >  static inline src_reg
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > > > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > > > index a20b2fd..bfbbd96 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > > > @@ -70,6 +70,7 @@ src_reg::src_reg(struct ::brw_reg reg) :
> > > >  {
> > > >     this->reg_offset = 0;
> > > >     this->reladdr = NULL;
> > > > +   this->force_vstride0 = false;
> > > >  }
> > > >  
> > > >  src_reg::src_reg(const dst_reg &reg) :
> > > > @@ -77,6 +78,7 @@ src_reg::src_reg(const dst_reg &reg) :
> > > >  {
> > > >     this->reladdr = reg.reladdr;
> > > >     this->swizzle = brw_swizzle_for_mask(reg.writemask);
> > > > +   this->force_vstride0 = false;
> > > >  }
> > > >  
> > > >  void


More information about the mesa-dev mailing list