[Mesa-dev] [PATCH v2 08/42] i965/hsw: Enable L3 atomics.

Samuel Iglesias Gonsálvez siglesias at igalia.com
Thu Nov 26 07:14:25 PST 2015


On 26/11/15 15:44, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
>> On 18/11/15 06:54, Jordan Justen wrote:
>>> From: Francisco Jerez <currojerez at riseup.net>
>>>
>>> Improves performance of the arb_shader_image_load_store-atomicity
>>> piglit test by over 25x (which isn't a real benchmark it's just heavy
>>> on atomics -- the improvement in a microbenchmark I wrote a while ago
>>> seemed to be even greater).  The drawback is one needs to be
>>> extra-careful not to hang the GPU (in fact the whole system).  A DC
>>> partition must have been allocated on L3, the "convert L3 cycle for DC
>>> to UC" bit may not be set, the MOCS L3 cacheability bit must be set
>>> for all surfaces accessed using DC atomics, and the SCRATCH1 and
>>> ROW_CHICKEN3 bits must be kept in sync.
>>>
>>> A fairly recent kernel is required for the command parser to allow
>>> writes to these registers.
>>> ---
>>>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 14 ++++++++++++++
>>>  1 file changed, 14 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> index 48bca29..c863b7f 100644
>>> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> @@ -254,5 +254,19 @@ setup_l3_config(struct brw_context *brw, const struct brw_l3_config *cfg)
>>>                  SET_FIELD(cfg->n[L3P_T], GEN7_L3CNTLREG3_T_ALLOC));
>>>  
>>>        ADVANCE_BATCH();
>>> +
>>> +      if (brw->is_haswell && brw->intelScreen->cmd_parser_version >= 4) {
>>> +         /* Enable L3 atomics on HSW if we have a DC partition, otherwise keep
>>> +          * them disabled to avoid crashing the system hard.
>>> +          */
>>> +         BEGIN_BATCH(5);
>>> +         OUT_BATCH(MI_LOAD_REGISTER_IMM | (5 - 2));
>>> +         OUT_BATCH(HSW_SCRATCH1);
>>> +         OUT_BATCH(has_dc ? 0 : HSW_SCRATCH1_L3_ATOMIC_DISABLE);
>>> +         OUT_BATCH(HSW_ROW_CHICKEN3);
>>> +         OUT_BATCH(HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16 |
>>> +                   (has_dc ? 0 : HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE));
>>
>>
>> I have not found references to ROW_CHICKEN3 nor register with 0xe49c
>> address offset in HSW PRMs, so these could be stupid questions:
>>
> 
> Yeah, these chicken registers are not part of the public PRMs.
> 
>> Why you need to set the L3 atomic disable flag in two different places
>> in ROW_CHICKEN3 register? Also, why the first flag is set
>> unconditionally while the second one only if we don't have a DC
>> partition? This is what you want?
> 
> It's a masked register as many other MMIO registers on Gen hardware: the
> upper 16 bits control which of the lower 16 bits are modified in the
> register.
> 
>>
>>
>> Also, if the "HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16" is really
>> needed, it could be defined as a constant in the first patch of the series.
>>
> If you think that would be more readable I guess we could define a
> REG_MASK() macro and use it wherever we write a masked register.  See
> attachment (applies on top of this patch).
> 

OK, thanks for the explanation. Please add my R-b to this patch and to
the attached one:

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>

Sam

>> Sam
>>
>>> +         ADVANCE_BATCH();
>>> +      }
>>>     }
>>>  }
>>>
> 


More information about the mesa-dev mailing list