[RESEND PATCH] drm/i915/gvt: avoid dispatching workloads when host is resetting chip

Fri Feb 17 05:50:06 UTC 2017

On 2017.02.17 13:28:52 +0800, Du, Changbin wrote:
> On Fri, Feb 17, 2017 at 11:59:58AM +0800, Gao, Ping A wrote:
> > Do we really need this? as I915 hold the struct_mutexto do reset,
> > andrelease after reset complete,  this mutex hold the workload dispatch
> > in gvt also.
> >
> not all. eg. we can fail at i915_alloc_request() somewhere.
>

I admit that we should handle gpu reset in progress case, my only concern
is where to handle this gracefully, e.g before choose workload like this or
delay that at request alloc time to check in fail path and if reset in progress
but not wedged can do retry on that, as even with this change, there's still
possible that after workload pick, something else caused gpu reset then will
still discard this to make this change not much helpful...

> > On 2017/2/9 10:15, changbin.du at intel.com wrote:
> > > From: Changbin Du <changbin.du at intel.com>
> > >
> > > It is meaningless that dispatch workload request to i915 when i915 is
> > > resetting chip, especially when the hang is not caused by guest who own
> > > the workload. This patch can reduce possible guest vgpu hang when host
> > > or anthor VMs cause a chip reset.
> > >
> > > Signed-off-by: Changbin Du <changbin.du at intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gvt/scheduler.c | 16 ++++++++++++++++
> > >  1 file changed, 16 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> > > index 7ea68a7..ca29fb4 100644
> > > --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> > > +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> > > @@ -406,8 +406,24 @@ static int workload_thread(void *priv)
> > >  	gvt_dbg_core("workload thread for ring %d started\n", ring_id);
> > >  
> > >  	while (!kthread_should_stop()) {
> > > +		workload = NULL;
> > >  		add_wait_queue(&scheduler->waitq[ring_id], &wait);
> > >  		do {
> > > +			/**
> > > +			 * we only wait for reset, but do not wait for wedged.
> > > +			 * For wedged, just let the workloads fail when dispatch
> > > +			 * them.
> > > +			 */
> > > +			if (i915_reset_in_progress(&gvt->dev_priv->gpu_error)) {
> > > +				gvt_dbg_core("wait i915 finish chip reset\n");
> > > +				ret = wait_on_bit_timeout(
> > > +						&gvt->dev_priv->gpu_error.flags,
> > > +						I915_RESET_IN_PROGRESS,
> > > +						TASK_UNINTERRUPTIBLE,
> > > +						HZ);
> > > +				if (ret)
> > > +					continue;
> > > +			}
> > >  			workload = pick_next_workload(gvt, ring_id);
> > >  			if (workload)
> > >  				break;
> > 
> > _______________________________________________
> > intel-gvt-dev mailing list
> > intel-gvt-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> 
> -- 
> Thanks,
> Changbin Du

> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gvt-dev/attachments/20170217/247a85a1/attachment-0001.sig>