[Intel-gfx] [PATCH] drm/i915/gen12: Add aux table invalidate for all engines

Mika Kuoppala mika.kuoppala at linux.intel.com
Tue May 12 14:22:37 UTC 2020


Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com> writes:

> On 5/7/20 7:20 AM, Mika Kuoppala wrote:
>> All engines, exception being blitter as it does not
>> care about the form, can access compressed surfaces.
>> 
>> So we need to add forced aux table invalidates
>> for those engines.
>> 
>> v2: virtual instance masking (Chris)
>> v3: bug on if not found (Chris)
>> 
>> References: d248b371f747 ("drm/i915/gen12: Invalidate aux table entries forcibly")
>> References bspec#43904, hsdes#1809175790
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Chuansheng Liu <chuansheng.liu at intel.com>
>> Cc: Rafael Antognolli <rafael.antognolli at intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_lrc.c | 86 +++++++++++++++++++++++++++--
>>   drivers/gpu/drm/i915/i915_reg.h     |  6 ++
>>   2 files changed, 87 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index bbdb0e2a4571..e12916e2799b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -4539,11 +4539,36 @@ static u32 preparser_disable(bool state)
>>   	return MI_ARB_CHECK | 1 << 8 | state;
>>   }
>>   
>> +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine)
>> +{
>> +	static const i915_reg_t vd[] = {
>> +		GEN12_VD0_AUX_NV,
>> +		GEN12_VD1_AUX_NV,
>> +		GEN12_VD2_AUX_NV,
>> +		GEN12_VD3_AUX_NV,
>> +	};
>> +
>> +	static const i915_reg_t ve[] = {
>> +		GEN12_VE0_AUX_NV,
>> +		GEN12_VE1_AUX_NV,
>> +	};
>> +
>> +	if (engine->class == VIDEO_DECODE_CLASS)
>> +		return vd[engine->instance];
>> +
>> +	if (engine->class == VIDEO_ENHANCEMENT_CLASS)
>> +		return ve[engine->instance];
>> +
>> +	GEM_BUG_ON("unknown aux_inv_reg\n");
>> +
>> +	return INVALID_MMIO_REG;
>> +}
>> +
>>   static u32 *
>> -gen12_emit_aux_table_inv(struct i915_request *rq, u32 *cs)
>> +gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
>>   {
>>   	*cs++ = MI_LOAD_REGISTER_IMM(1);
>> -	*cs++ = i915_mmio_reg_offset(GEN12_GFX_CCS_AUX_NV);
>> +	*cs++ = i915_mmio_reg_offset(inv_reg);
>>   	*cs++ = AUX_INV;
>>   	*cs++ = MI_NOOP;
>>   
>> @@ -4612,7 +4637,7 @@ static int gen12_emit_flush_render(struct i915_request *request,
>>   		cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR);
>>   
>>   		/* hsdes: 1809175790 */
>> -		cs = gen12_emit_aux_table_inv(request, cs);
>> +		cs = gen12_emit_aux_table_inv(GEN12_GFX_CCS_AUX_NV, cs);
>>   
>>   		*cs++ = preparser_disable(false);
>>   		intel_ring_advance(request, cs);
>> @@ -4621,6 +4646,56 @@ static int gen12_emit_flush_render(struct i915_request *request,
>>   	return 0;
>>   }
>>   
>> +static int gen12_emit_flush(struct i915_request *request, u32 mode)
>> +{
>> +	intel_engine_mask_t aux_inv = 0;
>> +	u32 cmd, *cs;
>> +
>> +	if (mode & EMIT_INVALIDATE)
>> +		aux_inv = request->engine->mask & ~BIT(BCS0);
>> +
>> +	cs = intel_ring_begin(request,
>> +			      4 + (aux_inv ? 2 * hweight8(aux_inv) + 2 : 0));
>> +	if (IS_ERR(cs))
>> +		return PTR_ERR(cs);
>> +
>> +	cmd = MI_FLUSH_DW + 1;
>> +
>> +	/* We always require a command barrier so that subsequent
>> +	 * commands, such as breadcrumb interrupts, are strictly ordered
>> +	 * wrt the contents of the write cache being flushed to memory
>> +	 * (and thus being coherent from the CPU).
>> +	 */
>> +	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
>> +
>> +	if (mode & EMIT_INVALIDATE) {
>> +		cmd |= MI_INVALIDATE_TLB;
>> +		if (request->engine->class == VIDEO_DECODE_CLASS)
>> +			cmd |= MI_INVALIDATE_BSD;
>> +	}
>> +
>> +	*cs++ = cmd;
>> +	*cs++ = LRC_PPHWSP_SCRATCH_ADDR;
>> +	*cs++ = 0; /* upper addr */
>> +	*cs++ = 0; /* value */
>> +
>> +	if (aux_inv) { /* hsdes: 1809175790 */
>> +		struct intel_engine_cs *engine;
>> +		unsigned int tmp;
>> +
>> +		*cs++ = MI_LOAD_REGISTER_IMM(hweight8(aux_inv));
>> +		for_each_engine_masked(engine, request->engine->gt,
>> +				       aux_inv, tmp) {
>> +			*cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
>> +			*cs++ = AUX_INV;
>
> Why do we loop through all engines? AFAICS the WA just says to 
> invalidate the current one. If it is because we don't know what we're 
> running on, can't we just use the automatic mmio remap on the CS? That 
> was added on purpose for per-engine registers that are not relative to 
> the mmio base (see bspec 45606)
>
> Daniele

I looked at the auto remap feature and it should be doable
with that feature.

Now it will inv with all engine backing up a virtual one.

So there is a difference. But we do not have a bug on other
engines to require inv. So we play it safe.
We would need a media pipeline expert to ack/nack actually.

Thanks for the remap io pointer,
-Mika








More information about the Intel-gfx mailing list