[Intel-gfx] [PATCH v8 01/69] drm/i915: Do not share hwsp across contexts any more, v7.

Mon Mar 15 12:08:01 UTC 2021

Op 2021-03-11 om 22:22 schreef Jason Ekstrand:
> First off, I'm just here asking questions right now trying to start
> getting my head around some of this stuff.  Feel free to ignore me or
> tell me to go away if I'm being annoying. :-)
>
> On Thu, Mar 11, 2021 at 7:49 AM Maarten Lankhorst
> <maarten.lankhorst at linux.intel.com> wrote:
>> Instead of sharing pages with breadcrumbs, give each timeline a
>> single page. This allows unrelated timelines not to share locks
>> any more during command submission.
>>
>> As an additional benefit, seqno wraparound no longer requires
>> i915_vma_pin, which means we no longer need to worry about a
>> potential -EDEADLK at a point where we are ready to submit.
>>
>> Changes since v1:
>> - Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle).
>> - Extra check for completion in intel_read_hwsp().
>> Changes since v2:
>> - Fix inconsistent indent in hwsp_alloc() (kbuild)
>> - memset entire cacheline to 0.
>> Changes since v3:
>> - Do same in intel_timeline_reset_seqno(), and clflush for good measure.
>> Changes since v4:
>> - Use refcounting on timeline, instead of relying on i915_active.
>> - Fix waiting on kernel requests.
>> Changes since v5:
>> - Bump amount of slots to maximum (256), for best wraparounds.
>> - Add hwsp_offset to i915_request to fix potential wraparound hang.
>> - Ensure timeline wrap test works with the changes.
>> - Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to
>>   fix a hang.
>> Changes since v6:
>> - Rename i915_request_active_offset to i915_request_active_seqno(),
>>   and elaborate the function. (tvrtko)
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>> Reviewed-by: Thomas Hellström <thomas.hellstrom at intel.com> #v1
>> Reported-by: kernel test robot <lkp at intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/gen2_engine_cs.c      |   2 +-
>>  drivers/gpu/drm/i915/gt/gen6_engine_cs.c      |   8 +-
>>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |  13 +-
>>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   1 +
>>  drivers/gpu/drm/i915/gt/intel_gt_types.h      |   4 -
>>  drivers/gpu/drm/i915/gt/intel_timeline.c      | 422 ++++--------------
>>  .../gpu/drm/i915/gt/intel_timeline_types.h    |  17 +-
>>  drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |   5 +-
>>  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  83 ++--
>>  drivers/gpu/drm/i915/i915_request.c           |   4 -
>>  drivers/gpu/drm/i915/i915_request.h           |  31 +-
>>  11 files changed, 175 insertions(+), 415 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> index b491a64919c8..9646200d2792 100644
>> --- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> @@ -143,7 +143,7 @@ static u32 *__gen2_emit_breadcrumb(struct i915_request *rq, u32 *cs,
>>                                    int flush, int post)
>>  {
>>         GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
>> -       GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
>> +       GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>>
>>         *cs++ = MI_FLUSH;
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
>> index ce38d1bcaba3..b388ceeeb1c9 100644
>> --- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
>> @@ -161,7 +161,7 @@ u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>>                  PIPE_CONTROL_DC_FLUSH_ENABLE |
>>                  PIPE_CONTROL_QW_WRITE |
>>                  PIPE_CONTROL_CS_STALL);
>> -       *cs++ = i915_request_active_timeline(rq)->hwsp_offset |
>> +       *cs++ = i915_request_active_seqno(rq) |
>>                 PIPE_CONTROL_GLOBAL_GTT;
>>         *cs++ = rq->fence.seqno;
>>
>> @@ -359,7 +359,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>>                  PIPE_CONTROL_QW_WRITE |
>>                  PIPE_CONTROL_GLOBAL_GTT_IVB |
>>                  PIPE_CONTROL_CS_STALL);
>> -       *cs++ = i915_request_active_timeline(rq)->hwsp_offset;
>> +       *cs++ = i915_request_active_seqno(rq);
>>         *cs++ = rq->fence.seqno;
>>
>>         *cs++ = MI_USER_INTERRUPT;
>> @@ -374,7 +374,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>>  u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
>>  {
>>         GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
>> -       GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
>> +       GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>>
>>         *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
>>         *cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
>> @@ -394,7 +394,7 @@ u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
>>         int i;
>>
>>         GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
>> -       GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
>> +       GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>>
>>         *cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB |
>>                 MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
>> index cac80af7ad1c..6b9c34d3ac8d 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
>> @@ -338,15 +338,14 @@ static u32 preempt_address(struct intel_engine_cs *engine)
>>
>>  static u32 hwsp_offset(const struct i915_request *rq)
>>  {
>> -       const struct intel_timeline_cacheline *cl;
>> +       const struct intel_timeline *tl;
>>
>> -       /* Before the request is executed, the timeline/cachline is fixed */
>> +       /* Before the request is executed, the timeline is fixed */
>> +       tl = rcu_dereference_protected(rq->timeline,
>> +                                      !i915_request_signaled(rq));
> Why is Gen8+ different from Gen2/6 here?  In particular, why not use
> i915_request_active_timeline(rq) or, better yet,
> i915_request_active_seqno()?  The primary difference I see is that the
> guard on the RCU is different but it's not immediately obvious to me
> why this should be different between hardware generations.  Also,
> i915_request_active_seqno() returns a u32, but that could be fixed.

The legacy rings different locking, and it was splatting on PROVE_RCU.

I didn't want to weaken it, so I just put in a different condition for gen8 that's slightly weaker, but still better than the '1'.