[Intel-gfx] [PATCH v8 01/69] drm/i915: Do not share hwsp across contexts any more, v7.
Maarten Lankhorst
maarten.lankhorst at linux.intel.com
Mon Mar 15 12:08:01 UTC 2021
Op 2021-03-11 om 22:22 schreef Jason Ekstrand:
> First off, I'm just here asking questions right now trying to start
> getting my head around some of this stuff. Feel free to ignore me or
> tell me to go away if I'm being annoying. :-)
>
> On Thu, Mar 11, 2021 at 7:49 AM Maarten Lankhorst
> <maarten.lankhorst at linux.intel.com> wrote:
>> Instead of sharing pages with breadcrumbs, give each timeline a
>> single page. This allows unrelated timelines not to share locks
>> any more during command submission.
>>
>> As an additional benefit, seqno wraparound no longer requires
>> i915_vma_pin, which means we no longer need to worry about a
>> potential -EDEADLK at a point where we are ready to submit.
>>
>> Changes since v1:
>> - Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle).
>> - Extra check for completion in intel_read_hwsp().
>> Changes since v2:
>> - Fix inconsistent indent in hwsp_alloc() (kbuild)
>> - memset entire cacheline to 0.
>> Changes since v3:
>> - Do same in intel_timeline_reset_seqno(), and clflush for good measure.
>> Changes since v4:
>> - Use refcounting on timeline, instead of relying on i915_active.
>> - Fix waiting on kernel requests.
>> Changes since v5:
>> - Bump amount of slots to maximum (256), for best wraparounds.
>> - Add hwsp_offset to i915_request to fix potential wraparound hang.
>> - Ensure timeline wrap test works with the changes.
>> - Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to
>> fix a hang.
>> Changes since v6:
>> - Rename i915_request_active_offset to i915_request_active_seqno(),
>> and elaborate the function. (tvrtko)
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>> Reviewed-by: Thomas Hellström <thomas.hellstrom at intel.com> #v1
>> Reported-by: kernel test robot <lkp at intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/gen2_engine_cs.c | 2 +-
>> drivers/gpu/drm/i915/gt/gen6_engine_cs.c | 8 +-
>> drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 13 +-
>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 1 +
>> drivers/gpu/drm/i915/gt/intel_gt_types.h | 4 -
>> drivers/gpu/drm/i915/gt/intel_timeline.c | 422 ++++--------------
>> .../gpu/drm/i915/gt/intel_timeline_types.h | 17 +-
>> drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 5 +-
>> drivers/gpu/drm/i915/gt/selftest_timeline.c | 83 ++--
>> drivers/gpu/drm/i915/i915_request.c | 4 -
>> drivers/gpu/drm/i915/i915_request.h | 31 +-
>> 11 files changed, 175 insertions(+), 415 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> index b491a64919c8..9646200d2792 100644
>> --- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> @@ -143,7 +143,7 @@ static u32 *__gen2_emit_breadcrumb(struct i915_request *rq, u32 *cs,
>> int flush, int post)
>> {
>> GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
>> - GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
>> + GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>>
>> *cs++ = MI_FLUSH;
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
>> index ce38d1bcaba3..b388ceeeb1c9 100644
>> --- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
>> @@ -161,7 +161,7 @@ u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>> PIPE_CONTROL_DC_FLUSH_ENABLE |
>> PIPE_CONTROL_QW_WRITE |
>> PIPE_CONTROL_CS_STALL);
>> - *cs++ = i915_request_active_timeline(rq)->hwsp_offset |
>> + *cs++ = i915_request_active_seqno(rq) |
>> PIPE_CONTROL_GLOBAL_GTT;
>> *cs++ = rq->fence.seqno;
>>
>> @@ -359,7 +359,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>> PIPE_CONTROL_QW_WRITE |
>> PIPE_CONTROL_GLOBAL_GTT_IVB |
>> PIPE_CONTROL_CS_STALL);
>> - *cs++ = i915_request_active_timeline(rq)->hwsp_offset;
>> + *cs++ = i915_request_active_seqno(rq);
>> *cs++ = rq->fence.seqno;
>>
>> *cs++ = MI_USER_INTERRUPT;
>> @@ -374,7 +374,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>> u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
>> {
>> GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
>> - GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
>> + GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>>
>> *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
>> *cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
>> @@ -394,7 +394,7 @@ u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
>> int i;
>>
>> GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
>> - GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
>> + GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>>
>> *cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB |
>> MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
>> index cac80af7ad1c..6b9c34d3ac8d 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
>> @@ -338,15 +338,14 @@ static u32 preempt_address(struct intel_engine_cs *engine)
>>
>> static u32 hwsp_offset(const struct i915_request *rq)
>> {
>> - const struct intel_timeline_cacheline *cl;
>> + const struct intel_timeline *tl;
>>
>> - /* Before the request is executed, the timeline/cachline is fixed */
>> + /* Before the request is executed, the timeline is fixed */
>> + tl = rcu_dereference_protected(rq->timeline,
>> + !i915_request_signaled(rq));
> Why is Gen8+ different from Gen2/6 here? In particular, why not use
> i915_request_active_timeline(rq) or, better yet,
> i915_request_active_seqno()? The primary difference I see is that the
> guard on the RCU is different but it's not immediately obvious to me
> why this should be different between hardware generations. Also,
> i915_request_active_seqno() returns a u32, but that could be fixed.
The legacy rings different locking, and it was splatting on PROVE_RCU.
I didn't want to weaken it, so I just put in a different condition for gen8 that's slightly weaker, but still better than the '1'.
More information about the Intel-gfx
mailing list