[Mesa-dev] [PATCH 1/6] i965/fs: add a helper function to create double immediates

Thu Jul 7 07:33:30 UTC 2016

On 06/07/16 22:31, Kenneth Graunke wrote:
> On Wednesday, July 6, 2016 12:09:57 PM PDT Samuel Iglesias Gonsálvez wrote:
>> From: Iago Toral Quiroga <itoral at igalia.com>
>>
>> Gen7 hardware does not support double immediates so these need
>> to be moved in 32-bit chunks to a regular vgrf instead. Instead
>> of doing this every time we need to create a DF immediate,
>> create a helper function that does the right thing depending
>> on the hardware generation.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.h       |  2 ++
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 43 ++++++++++++++++++++++++++++++++
>>  2 files changed, 45 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
>> index 4237197..dd7ce7d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>> @@ -167,6 +167,8 @@ public:
>>     bool lower_simd_width();
>>     bool opt_combine_constants();
>>  
>> +   fs_reg setup_imm_df(double v);
>> +
>>     void emit_dummy_fs();
>>     void emit_repclear_shader();
>>     fs_reg *emit_fragcoord_interpolation();
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index b3f5dfd..268c847 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -616,6 +616,49 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr *instr,
>>     return true;
>>  }
>>  
>> +fs_reg
>> +fs_visitor::setup_imm_df(double v)
>> +{
>> +   assert(devinfo->gen >= 7);
>> +
>> +   if (devinfo->gen >= 8)
>> +      return brw_imm_df(v);
> 
> You could probably use the DIM instruction on Haswell (only).
> 

I didn't know this instruction. I am going to try it.

>> +
>> +   /* gen7 does not support DF immediates, so we generate a 64-bit constant by
>> +    * writing the low 32-bit of the constant to suboffset 0 of a VGRF and
>> +    * the high 32-bit to suboffset 4 and then applying a stride of 0.
>> +    *
>> +    * Alternatively, we could also produce a normal VGRF (without stride 0)
>> +    * by writing to all the channels in the VGRF, however, that would hit the
>> +    * gen7 bug where we have to split writes that span more than 1 register
>> +    * into instructions with a width of 4 (otherwise the write to the second
>> +    * register written runs into an execmask hardware bug) which isn't very
>> +    * nice.
>> +    */
>> +   union {
>> +      double d;
>> +      struct {
>> +         uint32_t i1;
>> +         uint32_t i2;
>> +      };
>> +   } di;
>> +
>> +   di.d = v;
>> +
>> +   fs_reg tmp = vgrf(glsl_type::uint_type);
>> +   fs_inst *inst = bld.MOV(tmp, brw_imm_ud(di.i1));
>> +   inst->force_writemask_all = true;
>> +   inst->exec_size = 1;
>> +   inst->regs_written = 1;
>> +
>> +   inst = bld.MOV(horiz_offset(tmp, 1), brw_imm_ud(di.i2));
>> +   inst->force_writemask_all = true;
>> +   inst->exec_size = 1;
>> +   inst->regs_written = 1;
>> +
>> +   return component(retype(tmp, BRW_REGISTER_TYPE_DF), 0);
>> +}
>> +
>>  void
>>  fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>  {
>>
> 
> Otherwise this looks reasonable to me.  I was wondering whether we
> should use subscript() rather than horiz_offset(), but given that
> everything's exec_size 1 and NoMask, I don't think it matters.
> 
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> 

OK, thanks!

Sam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160707/8b0cc612/attachment.sig>