[Intel-gfx] [PATCH] drm/i915: Special handling for bonded requests

Wed May 27 09:36:31 UTC 2020

On 27/05/2020 10:20, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-05-27 09:53:22)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> Our generic mechanism for specifying aligned request start,
>> I915_EXEC_FENCE_SUBMIT, has a bit too relaxed rules when it comes to the
>> actual use case of media scalability on Tigerlake.
>>
>> While the submit fence can provide the aligned start, the  actual media
>> pipeline expects that execution remains in lockstep from then onwards.
>> Otherwise the hardware cannot handle well one of the pair getting
>> preempted leading to GPU hangs and corrupted media playback.
>>
>> This puts us in a difficult position where the only visible solution,
>> which does not involve adding new uapi, is to give more meaning to the
>> generic submit fence. At least when used between the video engines.
>>
>> The special semantics this patch applies in that case are two fold. First
>> is that both of the aligned pairs will be marked as non-preemptable and
>> second is ensuring both are not sharing ELSP ports with any other context.
>>
>> Non-preemptable property will ensure that once the start has been aligned
>> the requests remain executing until completion.
>>
>> Single port ELSP property will ensure there are no possible inversions
>> between different independent pairs of aligned requests.
> 
> Nak.
> 
>> Going forward we can think of introducing new uapi to request this
>> behaviour as a separate, more explicit flag, and at that point retire this
>> semi-generic special handling.
> 
> As CI will say, such behaviour will already need a new flag. Forcing
> no-preemption should be a privileged flag, so I would expect some
> acknowledge that this is a HW problem that we have to workaround.

I don't know how hw exactly works so from my side it's only empirical.

I agree new flag is needed but we also need a fix ASAP for TGL. And I 
don't think we can add new uapi in a reasonable time frame here. We 
would need the ctx set_parallel extension with a dont-preempt flag and 
multi-batch execbuf. And a lot of work in media-driver which will not be 
ready for TGL. I don't think a flag at the I915_EXEC_FENCE_SUBMIT level 
is a solution.

>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Suggested-by: Xiaogang Li <xiaogang.li at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_context.h |  2 +-
>>   drivers/gpu/drm/i915/gt/intel_lrc.c     | 46 +++++++++++++++++++++----
>>   2 files changed, 41 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
>> index 07be021882cc..576d11c0cdd9 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_context.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
>> @@ -212,7 +212,7 @@ intel_context_force_single_submission(const struct intel_context *ce)
>>   static inline void
>>   intel_context_set_single_submission(struct intel_context *ce)
>>   {
>> -       __set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ce->flags);
>> +       set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ce->flags);
>>   }
>>   
>>   static inline bool
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index 3214a4ecc31a..88cf20fd92c8 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -1734,8 +1734,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>>   
>>   static bool ctx_single_port_submission(const struct intel_context *ce)
>>   {
>> -       return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
>> -               intel_context_force_single_submission(ce));
>> +       return intel_context_force_single_submission(ce);
>>   }
>>   
>>   static bool can_merge_ctx(const struct intel_context *prev,
>> @@ -4929,9 +4928,41 @@ static void execlists_park(struct intel_engine_cs *engine)
>>          cancel_timer(&engine->execlists.preempt);
>>   }
>>   
>> +static void
>> +mark_bonded_pair(struct i915_request *rq, struct i915_request *signal)
>> +{
>> +       /*
>> +        * Give (temporary) special meaning to a pair requests with requested
>> +        * aligned start along the video engines.
>> +        *
>> +        * They should be non-preemptable and have all ELSP ports to themselves
>> +        * to avoid any deadlocks caused by inversions.
> 
> This sentence needs expanding, because you are implying that this is an
> issue we need to address in the code. If there is a deadlock resulting
> from incorrect submission ordering, that is a bug in the code.

If we add no-preempt I think we have to have single elsp because there 
is no ordering between unrelated pairs and then elsp inversion certainly 
sounds like a deadlock.

ctx-A: vcs0 + vcs1
ctx-B: vcs0 + vcs1

There is no ordering between A and B but we'd have to pick same port for 
both pairs of A and B respectively. I don't see that we can do that 
since all four submissions are async.

Regards,

Tvrtko