[Mesa-dev] [PATCH v2 08/42] i965/hsw: Enable L3 atomics.

Samuel Iglesias Gonsálvez siglesias at igalia.com
Thu Nov 26 07:16:42 PST 2015



On 26/11/15 16:14, Samuel Iglesias Gonsálvez wrote:
> On 26/11/15 15:44, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>>
>>> On 18/11/15 06:54, Jordan Justen wrote:
>>>> From: Francisco Jerez <currojerez at riseup.net>
>>>>
>>>> Improves performance of the arb_shader_image_load_store-atomicity
>>>> piglit test by over 25x (which isn't a real benchmark it's just heavy
>>>> on atomics -- the improvement in a microbenchmark I wrote a while ago
>>>> seemed to be even greater).  The drawback is one needs to be
>>>> extra-careful not to hang the GPU (in fact the whole system).  A DC
>>>> partition must have been allocated on L3, the "convert L3 cycle for DC
>>>> to UC" bit may not be set, the MOCS L3 cacheability bit must be set
>>>> for all surfaces accessed using DC atomics, and the SCRATCH1 and
>>>> ROW_CHICKEN3 bits must be kept in sync.
>>>>
>>>> A fairly recent kernel is required for the command parser to allow
>>>> writes to these registers.
>>>> ---
>>>>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 14 ++++++++++++++
>>>>  1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>>> index 48bca29..c863b7f 100644
>>>> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>>> @@ -254,5 +254,19 @@ setup_l3_config(struct brw_context *brw, const struct brw_l3_config *cfg)
>>>>                  SET_FIELD(cfg->n[L3P_T], GEN7_L3CNTLREG3_T_ALLOC));
>>>>  
>>>>        ADVANCE_BATCH();
>>>> +
>>>> +      if (brw->is_haswell && brw->intelScreen->cmd_parser_version >= 4) {
>>>> +         /* Enable L3 atomics on HSW if we have a DC partition, otherwise keep
>>>> +          * them disabled to avoid crashing the system hard.
>>>> +          */
>>>> +         BEGIN_BATCH(5);
>>>> +         OUT_BATCH(MI_LOAD_REGISTER_IMM | (5 - 2));
>>>> +         OUT_BATCH(HSW_SCRATCH1);
>>>> +         OUT_BATCH(has_dc ? 0 : HSW_SCRATCH1_L3_ATOMIC_DISABLE);
>>>> +         OUT_BATCH(HSW_ROW_CHICKEN3);
>>>> +         OUT_BATCH(HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16 |
>>>> +                   (has_dc ? 0 : HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE));
>>>
>>>
>>> I have not found references to ROW_CHICKEN3 nor register with 0xe49c
>>> address offset in HSW PRMs, so these could be stupid questions:
>>>
>>
>> Yeah, these chicken registers are not part of the public PRMs.
>>
>>> Why you need to set the L3 atomic disable flag in two different places
>>> in ROW_CHICKEN3 register? Also, why the first flag is set
>>> unconditionally while the second one only if we don't have a DC
>>> partition? This is what you want?
>>
>> It's a masked register as many other MMIO registers on Gen hardware: the
>> upper 16 bits control which of the lower 16 bits are modified in the
>> register.
>>
>>>
>>>
>>> Also, if the "HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16" is really
>>> needed, it could be defined as a constant in the first patch of the series.
>>>
>> If you think that would be more readable I guess we could define a
>> REG_MASK() macro and use it wherever we write a masked register.  See
>> attachment (applies on top of this patch).
>>
> 

Forgot to say my R-b to this patch is with the REG_MASK() macro change.

> OK, thanks for the explanation. Please add my R-b to this patch and to
> the attached one:
> 
> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> 
> Sam
> 
>>> Sam
>>>
>>>> +         ADVANCE_BATCH();
>>>> +      }
>>>>     }
>>>>  }
>>>>
>>


More information about the mesa-dev mailing list