[Intel-gfx] [PATCH] drm/i915/guc: Fix detection of GuC submission in use

Janusz Krzysztofik janusz.krzysztofik at linux.intel.com
Thu Sep 5 12:34:59 UTC 2019


Hi Michał,

On Thursday, September 5, 2019 2:08:12 PM CEST Michal Wajdeczko wrote:
> On Thu, 05 Sep 2019 13:16:31 +0200, Janusz Krzysztofik  
> <janusz.krzysztofik at linux.intel.com> wrote:
> 
> > The driver always assumes active GuC submission mode if it is
> > supported.  That's not true if GuC initialization fails for some
> > reason.  That may lead to kernel panics, caused e.g. by execlists
> > fallback submission mode incorrectly detecting GuC submission in use.
> >
> > Fix it by also checking for GuC enabled status.
> >
> > Fixes: 356c484822e6 ("drm/i915/uc: Add explicit DISABLED state for  
> > firmware")
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h  
> > b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > index 527995c21196..b28bab64a280 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > @@ -51,7 +51,8 @@ static inline bool  
> > intel_uc_supports_guc_submission(struct intel_uc *uc)
> > static inline bool intel_uc_uses_guc_submission(struct intel_uc *uc)
> >  {
> > -	return intel_guc_is_submission_supported(&uc->guc);
> > +	return intel_guc_is_enabled(&uc->guc) &&
> > +	       intel_guc_is_submission_supported(&uc->guc);
> 
> This wont fix your original problem (that btw is not possible to
> repro on drm-tip)

I'm not sure how you force GuC initialization to fail, mine just didn't have 
new firmware available.  On module load, the driver was starting up in 
execlists submission mode and BUG_ON( was raised from process_csb().  Running 
on a simulator, I was using current internal tree, based on current drm-tip.

> as after any GuC initialization failure we still
> treat GuC as "enabled":

My bad, I initially used intel_guc_is_running() but that interfered badly with 
module unload so I switched to intel_guc_is_enabled() and apparently didn't 
re-test if this still fixes the original issue.

> intel_guc_is_supported => H/W support (static)
> intel_guc_is_enabled => aka not disabled by the user (config)
> intel_guc_is_running => no major fw failure (runtime)
> 
> Note that we even s/intel_guc_is_enabled/intel_guc_is_running
> won't help as GuC may be running but we may fail to correctly
> initialize GuC submission.
> 
> Correct fix to original problem must be aligned with new GuC
> submission model (coming soon) and it may look as this:
> 
> +static inline bool intel_guc_is_submission_active(struct intel_guc *guc)
> +{
> +	GEM_BUG_ON(guc->submission_active && !intel_guc_is_running(guc));
> +	return guc->submission_active;
> +}
> 
> and then
> 
>   static inline bool intel_uc_uses_guc_submission(struct intel_uc *uc)
>   {
> -	return intel_guc_is_submission_supported(&uc->guc);
> +	return intel_guc_is_submission_active(&uc->guc);
>   }
> 
> We may need to revisit all uses/supports/ macros to better
> reflect configuration vs runtime differences.

Definitely, or we may get in troubles like the one I experienced on module 
unload.  And that can be done in advance, I believe.

As long as the unload issue is resolved by not using 
intel_uc_uses_guc_submission() where it occurred inappropriate, using 
(intel_guc_is_running() && intel_guc_is_submission_supported()) seems a valid 
fix to me, easy to migrate to intel_guc_is_submission_active() as soon as 
available.  I'll revert back to intel_guc_is_running(), fix the module unload 
issue and resubmit to trybot, maybe it can discover more issues with that.

Thanks,
Janusz

> 
> Thanks,
> Michal
> 






More information about the Intel-gfx mailing list