[Mesa-dev] [PATCH 13/23] i965/fs: Take into account reg_offset consistently for MRF regs.

Wed Feb 19 11:12:25 PST 2014

Paul Berry <stereotype441 at gmail.com> writes:

> On 15 January 2014 14:01, Francisco Jerez <currojerez at riseup.net> wrote:
>
>> Paul Berry <stereotype441 at gmail.com> writes:
>>
>> > On 2 December 2013 11:31, Francisco Jerez <currojerez at riseup.net> wrote:
>> >
>> >> Until now it was only being taken into account in the VEC4 back-end
>> >> but not in the FS back-end.  Do it in both cases.
>> >> ---
>> >>  src/mesa/drivers/dri/i965/brw_fs.h             |  2 +-
>> >>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++++++----
>> >>  src/mesa/drivers/dri/i965/brw_shader.h         |  7 ++++---
>> >>  3 files changed, 11 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h
>> >> b/src/mesa/drivers/dri/i965/brw_fs.h
>> >> index 2c36d9f..f918f7e 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>> >> @@ -615,4 +615,4 @@ bool brw_do_channel_expressions(struct exec_list
>> >> *instructions);
>> >>  bool brw_do_vector_splitting(struct exec_list *instructions);
>> >>  bool brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program
>> >> *prog);
>> >>
>> >> -struct brw_reg brw_reg_from_fs_reg(fs_reg *reg);
>> >> +struct brw_reg brw_reg_from_fs_reg(fs_reg *reg, unsigned
>> dispatch_width);
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> >> index 8d310a1..1de59eb 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> >> @@ -981,8 +981,9 @@ static uint32_t brw_file_from_reg(fs_reg *reg)
>> >>  }
>> >>
>> >>  struct brw_reg
>> >> -brw_reg_from_fs_reg(fs_reg *reg)
>> >> +brw_reg_from_fs_reg(fs_reg *reg, unsigned dispatch_width)
>> >>  {
>> >> +   const int reg_size = 4 * dispatch_width;
>> >>
>> >
>> > What happens when reg.type is UW and dispatch_width is 16?  In that case,
>> > we would compute reg_size == 64, but the correct value seems like it's
>> > actually 32 in this case.
>> >
>> > Are we perhaps relying on reg.type being a 32-bit type?  If so, maybe we
>> > should add an assertion:
>> >
>> >    assert(type_sz(reg.type) == 4);
>> >
>>
>> Nope, "reg_size" is supposed to be the size in bytes of a ::reg_offset
>> unit, i.e. one hardware register in SIMD8 mode and two hardware
>> registers in SIMD16 mode as the comment at the definition of
>> ::reg_offset explains.  The fixed factor of four is intentional and
>> correct no matter what the register type is.
>>
>> Thanks.
>>
>
> Ok, I see.  It appears that both this function *and* the comment above
> reg_offset are assuming that the data type is 32 bit.  The comment above
> reg_offset says:
>
>     * For pre-register-allocation GRFs and MRFs, this is in units of a
>     * float per pixel (1 hardware register for SIMD8 or SIMD4x2 mode,
>     * or 2 registers for SIMD16 mode).  For uniforms, this is in units
>     * of 1 float.
>
> but when we get around to adding support for double-precision floats (a
> feature of GL 4.0), this will no longer work; for double precision types
> we'll need reg_offset to be measured in units of at least 4 hardware
> registers in SIMD16 mode to avoid overlap.
>
> Similarly, for types that are 16 bits, if we consider reg_offset to be
> measured in units of 2 hardware registers in SIMD16 mode, we're actually
> wasting registers, since all 16 values actually fit in a single hardware
> register.  That's not really a big deal right now, since we use 16-bit
> types so rarely in the FS back-end.

I see what you mean, but it seems rather problematic to me to have the
unit of reg_offset depend on the register data type.  E.g. bit-casting
the contents of a register to a type of different size would involve
non-trivial algebra on reg_offset.  IMHO the ideal solution would be to
settle on a fixed unit (e.g. bytes, and remove the subreg_offset field
completely) and use a helper function to get the array indexing that
seems to be the main use case of reg_offset (e.g. 'index(base_reg, i)'
that would take into account the type size of 'base_reg' to calculate
the byte offset of element 'i').

> In fact, I bet we never use a nonzero reg_offset on a 16-bit type.
>
Not sure if we do already, but my surface packing/unpacking code might
in some situations.

> It still seems to me that it would be worth putting an assertion here to
> help alert us to what needs to change when we add double precision support
> (or if someday we have hardware that supports half float computation).  I'm
> not 100% sure what the assertion could be.  "assert(type_sz(reg->type) ==
> 4);" was an optimistic guess.  We might have to do
> "assert(type_sz(reg->type) == 4 || reg->reg_offset == 0);" in order to
> avoid tripping on the rare cases where we currently use 16-bit types in
> fragment shaders.

I think that such an assertion would break my homogeneous
packing/unpacking code if it ever gets an argument with reg_offset != 0,
because it casts its argument into a smaller type (either 8 or 16 bits)
and then uses 'subreg_offset' to select the individual components of the
packed vector.

P.S.: Sorry for the late reply.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140219/a649594c/attachment.pgp>