[Intel-xe] [PATCH] drm/xe: don't auto fall back to execlist mode if guc failed to init

Chang, Yu bruce yu.bruce.chang at intel.com
Wed Mar 29 20:16:27 UTC 2023



> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> Sent: Wednesday, March 29, 2023 1:12 PM
> To: Chang, Yu bruce <yu.bruce.chang at intel.com>
> Cc: intel-xe at lists.freedesktop.org
> Subject: Re: [Intel-xe] [PATCH] drm/xe: don't auto fall back to execlist mode
> if guc failed to init
> 
> On Wed, Mar 29, 2023 at 08:00:35PM +0000, Chang, Yu bruce wrote:
> >
> >
> > > -----Original Message-----
> > > From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > > Sent: Wednesday, March 29, 2023 12:14 PM
> > > To: Chang, Yu bruce <yu.bruce.chang at intel.com>
> > > Cc: intel-xe at lists.freedesktop.org
> > > Subject: Re: [Intel-xe] [PATCH] drm/xe: don't auto fall back to
> > > execlist mode if guc failed to init
> > >
> > > On Fri, Mar 24, 2023 at 12:15:27PM -0400, Rodrigo Vivi wrote:
> > > > On Thu, Mar 23, 2023 at 11:08:58PM +0000, Chang, Yu bruce wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Brost, Matthew <matthew.brost at intel.com>
> > > > > > Sent: Thursday, March 23, 2023 3:53 PM
> > > > > > To: Chang, Yu bruce <yu.bruce.chang at intel.com>
> > > > > > Cc: intel-xe at lists.freedesktop.org
> > > > > > Subject: Re: [Intel-xe] [PATCH] drm/xe: don't auto fall back
> > > > > > to execlist mode if guc failed to init
> > > > > >
> > > > > > On Thu, Mar 23, 2023 at 08:23:13PM +0000, Chang, Bruce wrote:
> > > > > > > In general, this is due to FW load failure, should just
> > > > > > > report error and fail the probe so that user can easily retry again.
> > > > > > >
> > > > > > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > > > > > Signed-off-by: Bruce Chang <yu.bruce.chang at intel.com>
> > > > > >
> > > > > > I have not tested this but assuming you did:
> > > > > > Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> > > > > >
> > > > > Yes, I tested on PVC and it used to fall back to execlist mode
> > > > > and constantly print out EXECLIST_STATUS. Now all those are not
> > > > > showing
> > > after this change.
> > > >
> > > > But now the entire execlist code is bogus.
> > > > We should remove it entirely or at least add a parameter that
> > > > allows that to be selected.
> > >
> > > :( My comment was entirely ignored and the patch was pushed.
> > > Well, I complained about dead code...
> > >
> > [BC] I was thinking you were discussing to remove execlist completely.
> > It is hard to make a decision in the code review.
> >
> > There is a "xe opens" document, I will start a conversation there, and
> > will @you as well.
> >
> > > after removing the fallback we need to either add a paremeter or
> > > kill execlists entirely. My preference is for killing that entirely.
> > >
> > [BC] there is a module parameter "enable_guc" to go back to execlist
> > mode
> >
> > > But also I just notice that this patch actually only does half of
> > > disabling the fallback.
> > >
> > > When fw is not found we still have the fallback in place:
> > >
> > > @xe_uc_init:
> > > err:
> > >         /* If any uC firmwares not found, fall back to execlists */
> > >         xe_device_guc_submission_disable(uc_to_xe(uc));
> > >
> > [BC] the xe_device_guc_submission_disable(uc_to_xe(uc)); and comment
> > should be removed, can you please double check the latest from Xe?
> 
> Please accept my apologies. I was in the middle of a rebase+build when I
> checked.
> I clearly need to setup some workdir here on my xe dev environment.
> 
> The only dead code now is the xe_device_guc_submission_disable() itself...
> 
[BC] correct, there is a moment I wanted to remove it.;) yes, this is not used
anymore.

> Sorry,
> Rodrigo.
> 
> >
> > Thanks,
> > Bruce
> >
> > > >
> > > > >
> > > > > There is still other unrelated issues during
> > > > > __pfx_ggtt_fini_noalloc, and need to be fixed as below.
> > > > >
> > > > > [  223.839894] BUG: KASAN: null-ptr-deref in
> > > > > ttm_resource_free+0xe4/0x140 [ttm] [  223.847211] Read of size 8
> > > > > at addr 0000000000000018 by task systemd-udevd/566
> > > > >
> > > > > [  223.856141] CPU: 0 PID: 566 Comm: systemd-udevd Not tainted
> > > > > 6.2.0-xe+ #4 [  223.864921] Hardware name: Intel Corporation
> > > > > WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P84.2103030140
> > > 03/03/2021 [  223.877365] Call Trace:
> > > > > [  223.881707]  <TASK>
> > > > > [  223.885658]  dump_stack_lvl+0x5b/0x85 [  223.891200]
> > > > > print_report+0x499/0x4aa [  223.896690]  ?
> > > > > ttm_resource_free+0xe4/0x140 [ttm] [  223.903268]
> > > > > kasan_report+0x99/0x1a0 [  223.908683]  ?
> > > > > ttm_resource_free+0xe4/0x140 [ttm] [  223.915210]
> > > > > ttm_resource_free+0xe4/0x140 [ttm] [  223.921621]
> > > > > ttm_bo_release+0x3e5/0x550 [ttm] [  223.927811]  ?
> > > > > __pfx_ttm_bo_release+0x10/0x10 [ttm] [  223.934530]  ?
> > > > > ttm_bo_kunmap+0x11f/0x160 [ttm] [  223.940775]  ?
> > > > > __pfx_ggtt_fini_noalloc+0x10/0x10 [xe]
> > > > >
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/xe/xe_gt.c | 4 ++--
> > > > > > > drivers/gpu/drm/xe/xe_uc.c
> > > > > > > | 3 ---
> > > > > > >  2 files changed, 2 insertions(+), 5 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > b/drivers/gpu/drm/xe/xe_gt.c index
> > > > > > > daa433d0f2f5..8a436c95591e
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > @@ -455,9 +455,9 @@ static int gt_fw_domain_init(struct xe_gt
> *gt)
> > > > > > >  			goto err_force_wake;
> > > > > > >  	}
> > > > > > >
> > > > > > > -	/* Allow driver to load if uC init fails (likely missing firmware)
> > > */
> > > > > > >  	err = xe_uc_init(&gt->uc);
> > > > > > > -	XE_WARN_ON(err);
> > > > > > > +	if (err)
> > > > > > > +		goto err_force_wake;
> > > > > > >
> > > > > > >  	err = xe_uc_init_hwconfig(&gt->uc);
> > > > > > >  	if (err)
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_uc.c
> > > > > > > b/drivers/gpu/drm/xe/xe_uc.c index
> > > > > > > 4ccf2b3435e1..70eabf567156
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_uc.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_uc.c
> > > > > > > @@ -54,9 +54,6 @@ int xe_uc_init(struct xe_uc *uc)
> > > > > > >  	return 0;
> > > > > > >
> > > > > > >  err:
> > > > > > > -	/* If any uC firmwares not found, fall back to execlists */
> > > > > > > -	xe_device_guc_submission_disable(uc_to_xe(uc));
> > > > > > > -
> > > > > > >  	return ret;
> > > > > > >  }
> > > > > > >
> > > > > > > --
> > > > > > > 2.25.1
> > > > > > >


More information about the Intel-xe mailing list