[Intel-gfx] [PATCH v13 4/7] drm/i915: No TLB invalidation on suspended GT
Cavitt, Jonathan
jonathan.cavitt at intel.com
Fri Oct 13 14:42:49 UTC 2023
-----Original Message-----
From: Harrison, John C <john.c.harrison at intel.com>
Sent: Thursday, October 12, 2023 6:08 PM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-gfx at lists.freedesktop.org
Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; chris.p.wilson at linux.intel.com; Iddamsetty, Aravind <aravind.iddamsetty at intel.com>; Yang, Fei <fei.yang at intel.com>; Shyti, Andi <andi.shyti at intel.com>; Das, Nirmoy <nirmoy.das at intel.com>; Krzysztofik, Janusz <janusz.krzysztofik at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; tvrtko.ursulin at linux.intel.com; jani.nikula at linux.intel.com
Subject: Re: [PATCH v13 4/7] drm/i915: No TLB invalidation on suspended GT
>
> On 10/12/2023 15:38, Jonathan Cavitt wrote:
> > In case of GT is suspended, don't allow submission of new TLB invalidation
> > request and cancel all pending requests. The TLB entries will be
> > invalidated either during GuC reload or on system resume.
> >
> > Signed-off-by: Fei Yang <fei.yang at intel.com>
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > CC: John Harrison <john.c.harrison at intel.com>
> > Reviewed-by: Andi Shyti <andi.shyti at linux.intel.com>
> > Acked-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Acked-by: Nirmoy Das <nirmoy.das at intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 +
> > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 22 ++++++++++++-------
> > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 7 ++++++
> > 3 files changed, 22 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 0949628d69f8b..2b6dfe62c8f2a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -537,4 +537,5 @@ int intel_guc_invalidate_tlb_engines(struct intel_guc *guc);
> > int intel_guc_invalidate_tlb_guc(struct intel_guc *guc);
> > int intel_guc_tlb_invalidation_done(struct intel_guc *guc,
> > const u32 *payload, u32 len);
> > +void wake_up_all_tlb_invalidate(struct intel_guc *guc);
> > #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 1377398afcdfa..3a0d20064878a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1796,13 +1796,24 @@ static void __guc_reset_context(struct intel_context *ce, intel_engine_mask_t st
> > intel_context_put(parent);
> > }
> >
> > -void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stalled)
> > +void wake_up_all_tlb_invalidate(struct intel_guc *guc)
> > {
> > struct intel_guc_tlb_wait *wait;
> > + unsigned long i;
> > +
> > + if (HAS_GUC_TLB_INVALIDATION(guc_to_gt(guc)->i915)) {
> Why the change from 'if(!is_available) return' to 'if(HAS_) {doStuff}'?
I feel like this question has two parts, so I'll answer them separately:
1. Why HAS_GUC_TLB_INVALIDATION and not intel_guc_tlb_invalidation_is_available?
Wake_up_all_tlb_invalidate is called during the suspend/resume path, specifically in the
middle of suspend. It's required for it to be called here to clean up any invalidations left
in the queue during the suspend/resume phase because they are no longer valid requests.
However, the suspend/resume phase also resets GuC, so intel_guc_is_ready returns false.
In short, using intel_guc_invalidation_is_available was causing us to skip this code section
incorrectly, resulting in spurious GuC TLB invalidation timeout errors during gt reset.
2. Why use a positive check to perform and not a negative check to skip?
In patch 3, wake_up_all_tlb_invalidate was originally called universally on all platforms
during intel_guc_submission_reset, which is incorrect and not how was reimplemented here.
I discovered this was the case and retroactively corrected it, as seen below.
Because of how intel_guc_submission_reset is structured, a negative check to skip wouldn't
make much sense there, so I used a positive check to perform instead. This is a holdover from
that implementation, and was kept to maintain consistency between patches 3 and 4. It's
probably not as big of a deal as I'm imagining, but I think it would be awkward if the initial
implementation in intel_guc_submission_reset and the reimplementation in
wake_up_all_tlb_invalidate weren't superficially the same, even if they were functionally
equivalent otherwise.
-Jonathan Cavitt
>
> John.
>
> > + xa_lock_irq(&guc->tlb_lookup);
> > + xa_for_each(&guc->tlb_lookup, i, wait)
> > + wake_up(&wait->wq);
> > + xa_unlock_irq(&guc->tlb_lookup);
> > + }
> > +}
> > +
> > +void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stalled)
> > +{
> > struct intel_context *ce;
> > unsigned long index;
> > unsigned long flags;
> > - unsigned long i;
> >
> > if (unlikely(!guc_submission_initialized(guc))) {
> > /* Reset called during driver load? GuC not yet initialised! */
> > @@ -1833,12 +1844,7 @@ void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stall
> > * The full GT reset will have cleared the TLB caches and flushed the
> > * G2H message queue; we can release all the blocked waiters.
> > */
> > - if (HAS_GUC_TLB_INVALIDATION(guc_to_gt(guc)->i915)) {
> > - xa_lock_irq(&guc->tlb_lookup);
> > - xa_for_each(&guc->tlb_lookup, i, wait)
> > - wake_up(&wait->wq);
> > - xa_unlock_irq(&guc->tlb_lookup);
> > - }
> > + wake_up_all_tlb_invalidate(guc);
> > }
> >
> > static void guc_cancel_context_requests(struct intel_context *ce)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > index 98b103375b7ab..27f6561dd7319 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > @@ -688,6 +688,8 @@ void intel_uc_suspend(struct intel_uc *uc)
> > /* flush the GSC worker */
> > intel_gsc_uc_flush_work(&uc->gsc);
> >
> > + wake_up_all_tlb_invalidate(guc);
> > +
> > if (!intel_guc_is_ready(guc)) {
> > guc->interrupts.enabled = false;
> > return;
> > @@ -736,6 +738,11 @@ static int __uc_resume(struct intel_uc *uc, bool enable_communication)
> >
> > intel_gsc_uc_resume(&uc->gsc);
> >
> > + if (intel_guc_tlb_invalidation_is_available(guc)) {
> > + intel_guc_invalidate_tlb_engines(guc);
> > + intel_guc_invalidate_tlb_guc(guc);
> > + }
> > +
> > return 0;
> > }
> >
>
>
More information about the Intel-gfx
mailing list