[Intel-gfx] [PATCH] drm/i915: Skip context_barrier emission for unused contexts

Thu Jun 6 11:52:59 UTC 2019

On 06/06/2019 12:37, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-06-06 12:28:23)
>>
>> On 06/06/2019 12:06, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-06-06 10:19:10)
>>>>
>>>> On 06/06/2019 10:09, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2019-06-05 11:40:27)
>>>>>>
>>>>>> On 04/06/2019 16:24, Chris Wilson wrote:
>>>>>>> The intent was to skip unused HW contexts by checking ce->state.
>>>>>>> However, this only works for execlists where the ppGTT pointers is
>>>>>>> stored inside the HW context. For gen7, the ppGTT is alongside the
>>>>>>> logical state and must be updated on all active engines but, crucially,
>>>>>>> only on active engines. As we need different checks, and to keep
>>>>>>> context_barrier_task() agnostic, pass in the predicate.
>>>>>>>
>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110836
>>>>>>> Fixes: 62c8e423450d ("drm/i915: Skip unused contexts for context_barrier_task()")
>>>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/i915/gem/i915_gem_context.c   | 15 ++++++++++++++-
>>>>>>>      .../drm/i915/gem/selftests/i915_gem_context.c | 19 +++++++++++++++----
>>>>>>>      2 files changed, 29 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>>>> index 08721ef62e4e..6819b598d226 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>>>> @@ -902,6 +902,7 @@ static void cb_retire(struct i915_active *base)
>>>>>>>      I915_SELFTEST_DECLARE(static intel_engine_mask_t context_barrier_inject_fault);
>>>>>>>      static int context_barrier_task(struct i915_gem_context *ctx,
>>>>>>>                                  intel_engine_mask_t engines,
>>>>>>> +                             bool (*skip)(struct intel_context *ce, void *data),
>>>>>>>                                  int (*emit)(struct i915_request *rq, void *data),
>>>>>>>                                  void (*task)(void *data),
>>>>>>>                                  void *data)
>>>>>>> @@ -931,7 +932,10 @@ static int context_barrier_task(struct i915_gem_context *ctx,
>>>>>>>                          break;
>>>>>>>                  }
>>>>>>>      
>>>>>>> -             if (!(ce->engine->mask & engines) || !ce->state)
>>>>>>> +             if (!(ce->engine->mask & engines))
>>>>>>> +                     continue;
>>>>>>> +
>>>>>>> +             if (skip && skip(ce, data))
>>>>>>>                          continue;
>>>>>>>      
>>>>>>>                  rq = intel_context_create_request(ce);
>>>>>>> @@ -1058,6 +1062,14 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>>>>>>>          return 0;
>>>>>>>      }
>>>>>>>      
>>>>>>> +static bool skip_ppgtt_update(struct intel_context *ce, void *data)
>>>>>>> +{
>>>>>>> +     if (HAS_LOGICAL_RING_CONTEXTS(ce->engine->i915))
>>>>>>> +             return !ce->state;
>>>>>>> +     else
>>>>>>> +             return !atomic_read(&ce->pin_count);
>>>>>>
>>>>>> Would "return !atomic_read(&ce->pin_count) || !ce->state;" work and
>>>>>> avoid the somewhat irky HAS_LOGICAL_RING_CONTEXTS check?
>>>>>
>>>>> No, because we need the barrier on gen7 !rcs which doesn't have
>>>>> ce->state (but does need to switch mm).
>>>>
>>>> That's not the path which would be covered by !atomic_read(&ce->pin_count) ?
>>>
>>> And when pin_count > 0 it would then skip due to !ce->state, leading us
>>> back to the current problem.
>>
>> Brain fart.. Okay.. but pin_count check itself is not sufficient for
>> both platforms? Can't we skip pin_count == 0 && ce->state != NULL on
>> execlists?
> 
> Alas not. It is really is a divergence in HW behaviour.
> 
> For execlists, we need to modifying an existing context image, but may
> skip if there is no image at all (as it will be constructed with the
> right registers).
> 
> For gen7, we need to take care of the supplementary pinning of
> ctx->ppgtt. That is done for us on the first context_pin, but when
> pin_count > 0, we must fiddle. Fiddly fiddling. There's probably a way
> to resolve it without having to pull so much lower level detail into
> i915_gem_context.c, I am not seeing it atm (and honestly not looking too
> hard ;).

Okay, I don't fancy looking too hard either. It was only a discussion 
about a potential optimisation to begin with. So without much further ado:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko