[Mesa-dev] [PATCH 13/23] i965/fs: Take into account reg_offset consistently for MRF regs.

Tue Jan 21 19:47:10 PST 2014

On 15 January 2014 14:01, Francisco Jerez <currojerez at riseup.net> wrote:

> Paul Berry <stereotype441 at gmail.com> writes:
>
> > On 2 December 2013 11:31, Francisco Jerez <currojerez at riseup.net> wrote:
> >
> >> Until now it was only being taken into account in the VEC4 back-end
> >> but not in the FS back-end.  Do it in both cases.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_fs.h             |  2 +-
> >>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++++++----
> >>  src/mesa/drivers/dri/i965/brw_shader.h         |  7 ++++---
> >>  3 files changed, 11 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h
> >> b/src/mesa/drivers/dri/i965/brw_fs.h
> >> index 2c36d9f..f918f7e 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> >> @@ -615,4 +615,4 @@ bool brw_do_channel_expressions(struct exec_list
> >> *instructions);
> >>  bool brw_do_vector_splitting(struct exec_list *instructions);
> >>  bool brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program
> >> *prog);
> >>
> >> -struct brw_reg brw_reg_from_fs_reg(fs_reg *reg);
> >> +struct brw_reg brw_reg_from_fs_reg(fs_reg *reg, unsigned
> dispatch_width);
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> >> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> >> index 8d310a1..1de59eb 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> >> @@ -981,8 +981,9 @@ static uint32_t brw_file_from_reg(fs_reg *reg)
> >>  }
> >>
> >>  struct brw_reg
> >> -brw_reg_from_fs_reg(fs_reg *reg)
> >> +brw_reg_from_fs_reg(fs_reg *reg, unsigned dispatch_width)
> >>  {
> >> +   const int reg_size = 4 * dispatch_width;
> >>
> >
> > What happens when reg.type is UW and dispatch_width is 16?  In that case,
> > we would compute reg_size == 64, but the correct value seems like it's
> > actually 32 in this case.
> >
> > Are we perhaps relying on reg.type being a 32-bit type?  If so, maybe we
> > should add an assertion:
> >
> >    assert(type_sz(reg.type) == 4);
> >
>
> Nope, "reg_size" is supposed to be the size in bytes of a ::reg_offset
> unit, i.e. one hardware register in SIMD8 mode and two hardware
> registers in SIMD16 mode as the comment at the definition of
> ::reg_offset explains.  The fixed factor of four is intentional and
> correct no matter what the register type is.
>
> Thanks.
>

Ok, I see.  It appears that both this function *and* the comment above
reg_offset are assuming that the data type is 32 bit.  The comment above
reg_offset says:

    * For pre-register-allocation GRFs and MRFs, this is in units of a
    * float per pixel (1 hardware register for SIMD8 or SIMD4x2 mode,
    * or 2 registers for SIMD16 mode).  For uniforms, this is in units
    * of 1 float.

but when we get around to adding support for double-precision floats (a
feature of GL 4.0), this will no longer work; for double precision types
we'll need reg_offset to be measured in units of at least 4 hardware
registers in SIMD16 mode to avoid overlap.

Similarly, for types that are 16 bits, if we consider reg_offset to be
measured in units of 2 hardware registers in SIMD16 mode, we're actually
wasting registers, since all 16 values actually fit in a single hardware
register.  That's not really a big deal right now, since we use 16-bit
types so rarely in the FS back-end.  In fact, I bet we never use a nonzero
reg_offset on a 16-bit type.

It still seems to me that it would be worth putting an assertion here to
help alert us to what needs to change when we add double precision support
(or if someday we have hardware that supports half float computation).  I'm
not 100% sure what the assertion could be.  "assert(type_sz(reg->type) ==
4);" was an optimistic guess.  We might have to do
"assert(type_sz(reg->type) == 4 || reg->reg_offset == 0);" in order to
avoid tripping on the rare cases where we currently use 16-bit types in
fragment shaders.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140121/6bdabadc/attachment.html>