[Intel-gfx] [intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv (rev3)
Yang, Fei
fei.yang at intel.com
Thu Mar 3 04:00:08 UTC 2022
Hi Chris, for some reason I didn't receive the review email, so I copied your comments from patchwork and faked this email.
>> static void execlists_dequeue(struct intel_engine_cs *engine)
>> {
>> struct intel_engine_execlists * const execlists = &engine->execlists;
>> @@ -1538,6 +1566,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>> }
>>
>> if (__i915_request_submit(rq)) {
>> + /* hsdes: 1809175790 */
>> + if ((GRAPHICS_VER(engine->i915) == 12) &&
>> + rq->vd_ve_aux_inv &&
>> + (engine->class == VIDEO_DECODE_CLASS ||
>> + engine->class == VIDEO_ENHANCEMENT_CLASS)) {
> We do not need the extra checks here; we just do as we are told. We only
> tell ourselves to apply the fixup when required.
Without checking GRAPHICS_VER, I'm seeing a lot of regressions on older platforms in the CI result.
This workaround was only implemented for gen12 (gen12_emit_flush_xcs).
Without checking engine->class, I'm seeing boot issues due to GEM_BUG_ON() in aux_inv_reg().
Any rq will go through this code regardless of engine class and gen version, so the checking seems to be
necessary.
>> + *rq->vd_ve_aux_inv = i915_mmio_reg_offset
> Likewise, vd_ve is overspecific, aux_inv_fixup or aux_inv_wa (or
> wa_aux_iv, fixup_aux_inv).
I wanted to be specific because the workaround was only implemented for vd/ve engines.
But I'm ok with your proposal.
>> + (aux_inv_reg(engine));
>> + rq->vd_ve_aux_inv = NULL;
> Move this to i915_request initialisation so that we only set aux_inv
> when required, which probably explains the extra defence.
The pointer is currently initialized with 0x5a5a. I set it to NULL in gen12_emit_flush_xcs, otherwise the rq will
enter that if-statement with an invalid pointer.
I'm not familiar with the code, there seems to be multiple functions allocating the structure. I agree it's better
to set it to NULL at initialization, but need some guidance on where is the best place to do so.
>> + rq->execution_mask = engine->mask;
>> + }
>> if (!merge) {
>> *port++ = i915_request_get(last);
>> last = NULL;
>> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
>> index 28b1f9db5487..69de32e5e15d 100644
>> --- a/drivers/gpu/drm/i915/i915_request.h
>> +++ b/drivers/gpu/drm/i915/i915_request.h
>> @@ -350,6 +350,8 @@ struct i915_request {
>> struct list_head link;
>> unsigned long delay;
>> } mock;)
>> +
>> + u32 *vd_ve_aux_inv;
> Not at the end of the struct; that's where we put things in the dungeon.
> The selftest struct should be last; I do hope no one has been putting
> things at random places in the struct without considering the layout and
> semantics. From the flow, this is akin to batch, capture_list; before
> emitted_jiffies would be a good spot.
Got it, will change. I thought adding at the end would be safer, thanks for the explanation.
> -Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20220303/bb6c6a87/attachment.htm>
More information about the Intel-gfx
mailing list