[Intel-gfx] [PATCH v13 4/7] drm/i915: No TLB invalidation on suspended GT

Cavitt, Jonathan jonathan.cavitt at intel.com
Fri Oct 13 14:42:49 UTC 2023


-----Original Message-----
From: Harrison, John C <john.c.harrison at intel.com> 
Sent: Thursday, October 12, 2023 6:08 PM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-gfx at lists.freedesktop.org
Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; chris.p.wilson at linux.intel.com; Iddamsetty, Aravind <aravind.iddamsetty at intel.com>; Yang, Fei <fei.yang at intel.com>; Shyti, Andi <andi.shyti at intel.com>; Das, Nirmoy <nirmoy.das at intel.com>; Krzysztofik, Janusz <janusz.krzysztofik at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; tvrtko.ursulin at linux.intel.com; jani.nikula at linux.intel.com
Subject: Re: [PATCH v13 4/7] drm/i915: No TLB invalidation on suspended GT
> 
> On 10/12/2023 15:38, Jonathan Cavitt wrote:
> > In case of GT is suspended, don't allow submission of new TLB invalidation
> > request and cancel all pending requests. The TLB entries will be
> > invalidated either during GuC reload or on system resume.
> >
> > Signed-off-by: Fei Yang <fei.yang at intel.com>
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > CC: John Harrison <john.c.harrison at intel.com>
> > Reviewed-by: Andi Shyti <andi.shyti at linux.intel.com>
> > Acked-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Acked-by: Nirmoy Das <nirmoy.das at intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  1 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 22 ++++++++++++-------
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  7 ++++++
> >   3 files changed, 22 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 0949628d69f8b..2b6dfe62c8f2a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -537,4 +537,5 @@ int intel_guc_invalidate_tlb_engines(struct intel_guc *guc);
> >   int intel_guc_invalidate_tlb_guc(struct intel_guc *guc);
> >   int intel_guc_tlb_invalidation_done(struct intel_guc *guc,
> >   				    const u32 *payload, u32 len);
> > +void wake_up_all_tlb_invalidate(struct intel_guc *guc);
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 1377398afcdfa..3a0d20064878a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1796,13 +1796,24 @@ static void __guc_reset_context(struct intel_context *ce, intel_engine_mask_t st
> >   	intel_context_put(parent);
> >   }
> >   
> > -void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stalled)
> > +void wake_up_all_tlb_invalidate(struct intel_guc *guc)
> >   {
> >   	struct intel_guc_tlb_wait *wait;
> > +	unsigned long i;
> > +
> > +	if (HAS_GUC_TLB_INVALIDATION(guc_to_gt(guc)->i915)) {
> Why the change from 'if(!is_available) return' to 'if(HAS_) {doStuff}'?


I feel like this question has two parts, so I'll answer them separately:

1. Why HAS_GUC_TLB_INVALIDATION and not intel_guc_tlb_invalidation_is_available?

Wake_up_all_tlb_invalidate is called during the suspend/resume path, specifically in the
middle of suspend.  It's required for it to be called here to clean up any invalidations left
in the queue during the suspend/resume phase because they are no longer valid requests.
However, the suspend/resume phase also resets GuC, so intel_guc_is_ready returns false.
In short, using intel_guc_invalidation_is_available was causing us to skip this code section
incorrectly, resulting in spurious GuC TLB invalidation timeout errors during gt reset.


2. Why use a positive check to perform and not a negative check to skip?

In patch 3, wake_up_all_tlb_invalidate was originally called universally on all platforms
during intel_guc_submission_reset, which is incorrect and not how was reimplemented here.
I discovered this was the case and retroactively corrected it, as seen below.
Because of how intel_guc_submission_reset is structured, a negative check to skip wouldn't
make much sense there, so I used a positive check to perform instead.  This is a holdover from
that implementation, and was kept to maintain consistency between patches 3 and 4.  It's
probably not as big of a deal as I'm imagining, but I think it would be awkward if the initial
implementation in intel_guc_submission_reset and the reimplementation in
wake_up_all_tlb_invalidate weren't superficially the same, even if they were functionally
equivalent otherwise.


-Jonathan Cavitt


> 
> John.
> 
> > +		xa_lock_irq(&guc->tlb_lookup);
> > +		xa_for_each(&guc->tlb_lookup, i, wait)
> > +			wake_up(&wait->wq);
> > +		xa_unlock_irq(&guc->tlb_lookup);
> > +	}
> > +}
> > +
> > +void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stalled)
> > +{
> >   	struct intel_context *ce;
> >   	unsigned long index;
> >   	unsigned long flags;
> > -	unsigned long i;
> >   
> >   	if (unlikely(!guc_submission_initialized(guc))) {
> >   		/* Reset called during driver load? GuC not yet initialised! */
> > @@ -1833,12 +1844,7 @@ void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stall
> >   	 * The full GT reset will have cleared the TLB caches and flushed the
> >   	 * G2H message queue; we can release all the blocked waiters.
> >   	 */
> > -	if (HAS_GUC_TLB_INVALIDATION(guc_to_gt(guc)->i915)) {
> > -		xa_lock_irq(&guc->tlb_lookup);
> > -		xa_for_each(&guc->tlb_lookup, i, wait)
> > -			wake_up(&wait->wq);
> > -		xa_unlock_irq(&guc->tlb_lookup);
> > -	}
> > +	wake_up_all_tlb_invalidate(guc);
> >   }
> >   
> >   static void guc_cancel_context_requests(struct intel_context *ce)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > index 98b103375b7ab..27f6561dd7319 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > @@ -688,6 +688,8 @@ void intel_uc_suspend(struct intel_uc *uc)
> >   	/* flush the GSC worker */
> >   	intel_gsc_uc_flush_work(&uc->gsc);
> >   
> > +	wake_up_all_tlb_invalidate(guc);
> > +
> >   	if (!intel_guc_is_ready(guc)) {
> >   		guc->interrupts.enabled = false;
> >   		return;
> > @@ -736,6 +738,11 @@ static int __uc_resume(struct intel_uc *uc, bool enable_communication)
> >   
> >   	intel_gsc_uc_resume(&uc->gsc);
> >   
> > +	if (intel_guc_tlb_invalidation_is_available(guc)) {
> > +		intel_guc_invalidate_tlb_engines(guc);
> > +		intel_guc_invalidate_tlb_guc(guc);
> > +	}
> > +
> >   	return 0;
> >   }
> >   
> 
> 


More information about the Intel-gfx mailing list