[PATCH] drm/i915/gvt: handle workload lifecycle properly

Wed Mar 1 10:20:52 UTC 2017


> -----Original Message-----
> From: Zhenyu Wang [mailto:zhenyuw at linux.intel.com]
> Sent: Wednesday, March 1, 2017 6:04 PM
> To: Dong, Chuanxiao <chuanxiao.dong at intel.com>
> Cc: intel-gvt-dev at lists.freedesktop.org
> Subject: Re: [PATCH] drm/i915/gvt: handle workload lifecycle properly
> 
> On 2017.02.27 17:34:51 +0800, Chuanxiao Dong wrote:
> > Currently i915 has a request replay mechanism which can make sure the
> > request can be replayed after a GPU reset. With this mechanism, gvt
> > should wait until the GVT request seqno passed before complete the
> > current workload. So that there should be a context switch interrupt
> > come before gvt free the workload. In this way, workload lifecylce
> > matches with the i915 request lifecycle. The workload can only be
> > freed after the request is completed.
> >
> > Signed-off-by: Chuanxiao Dong <chuanxiao.dong at intel.com>
> > ---
> >  drivers/gpu/drm/i915/gvt/scheduler.c | 49
> > ++++++++++++++++++++++++------------
> >  1 file changed, 33 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c
> > b/drivers/gpu/drm/i915/gvt/scheduler.c
> > index e355a82..0967ca9 100644
> > --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> > +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> > @@ -151,6 +151,15 @@ static int shadow_context_status_change(struct
> notifier_block *nb,
> >  	case INTEL_CONTEXT_SCHEDULE_OUT:
> >  		intel_gvt_restore_render_mmio(workload->vgpu,
> >  					      workload->ring_id);
> > +		/* If the status is -EINPROGRESS means this workload
> > +		 * doesn't meet any issue during dispatching so when
> > +		 * get the SCHEDULE_OUT set the status to be zero for
> > +		 * good. If the status is NOT -EINPROGRESS means there
> > +		 * is something wrong happened during dispatching and
> > +		 * the status should not be set to zero
> > +		 */
> > +		if (workload->status == -EINPROGRESS)
> > +			workload->status = 0;
> >  		atomic_set(&workload->shadow_ctx_active, 0);
> >  		break;
> >  	default:
> > @@ -362,15 +371,23 @@ static void complete_current_workload(struct
> intel_gvt *gvt, int ring_id)
> >  	workload = scheduler->current_workload[ring_id];
> >  	vgpu = workload->vgpu;
> >
> > -	if (!workload->status && !vgpu->resetting) {
> > +	/* For the workload w/ request, needs to wait for the context
> > +	 * switch to make sure request is completed.
> > +	 * For the workload w/o request, directly complete the workload.
> > +	 */
> > +	if (workload->req) {
> >  		wait_event(workload->shadow_ctx_status_wq,
> >  			   !atomic_read(&workload->shadow_ctx_active));
> >
> > -		update_guest_context(workload);
> > +		i915_gem_request_put(fetch_and_zero(&workload->req));
> >
> > -		for_each_set_bit(event, workload->pending_events,
> > -				 INTEL_GVT_EVENT_MAX)
> > -			intel_vgpu_trigger_virtual_event(vgpu, event);
> > +		if (!workload->status && !vgpu->resetting) {
> > +			update_guest_context(workload);
> > +
> > +			for_each_set_bit(event, workload->pending_events,
> > +					 INTEL_GVT_EVENT_MAX)
> > +				intel_vgpu_trigger_virtual_event(vgpu,
> event);
> > +		}
> >  	}
> >
> >  	gvt_dbg_sched("ring id %d complete workload %p status %d\n", @@
> > -400,7 +417,6 @@ static int workload_thread(void *priv)
> >  	int ring_id = p->ring_id;
> >  	struct intel_gvt_workload_scheduler *scheduler = &gvt->scheduler;
> >  	struct intel_vgpu_workload *workload = NULL;
> > -	long lret;
> >  	int ret;
> >  	bool need_force_wake = IS_SKYLAKE(gvt->dev_priv);
> >  	DEFINE_WAIT_FUNC(wait, woken_wake_function); @@ -449,23
> +465,24 @@
> > static int workload_thread(void *priv)
> >
> >  		gvt_dbg_sched("ring id %d wait workload %p\n",
> >  				workload->ring_id, workload);
> > -
> > -		lret = i915_wait_request(workload->req,
> > +retry:
> > +		i915_wait_request(workload->req,
> >  					 0, MAX_SCHEDULE_TIMEOUT);
> > -		if (lret < 0) {
> > -			workload->status = lret;
> > -			gvt_err("fail to wait workload, skip\n");
> > -		} else {
> > -			workload->status = 0;
> > +		/* I915 has replay mechanism and a request will be replayed
> > +		 * if there is i915 reset. So the seqno will be updated anyway.
> > +		 * If the seqno is not updated yet after waiting, which means
> > +		 * the replay may still be in progress and we can wait again.
> > +		 */
> > +		if (!i915_gem_request_completed(workload->req)) {
> > +			gvt_err("workload %p not completed, wait again\n",
> > +					workload);
> 
> should this be error?
If the request is not completed but gvt finished the waiting, it means GPU reset occurred and replay is happening. During replay, i915 will skip the guilty ring buffer so the request will finally success, so not treating this as error. If mask the workload status to be failed, according to the workload complete follow, gvt will not update guest context and inject the virtual events.

> 
> > +			goto retry;
> >  		}
> 
> So looks we should actually need to wait context switch change notify
> happened with correct workload status? And we need to check
> i915_wait_request() return, as for normal state shouldn't need to check
> request complete again.
> 
> What if a bad guest workload that does cause gpu hang? Now gvt context is
> not bannable but I think we should care for that case too.
I915_wait_request()'s return cannot indicate if the request caused GPU hang or not. If a bad guest workload caused GPU hang, i915_wait_request() still can return with the correct timeout value by waked up from the i915 reset flow, and from this value, we cannot know if GPU hang is occurred or not. So cannot rely on the return of i915_wait_request() to identify if the workload has a good status or not.
Based on this, when returned from i915_wait_request(), check the seqno to know if this request is completed or not. If not, means GPU hang occurred and replaying is happening, so we can wait again. If completed, just go ahead to wait for the context schedule out.

> 
> >
> >  complete:
> >  		gvt_dbg_sched("will complete workload %p, status: %d\n",
> >  				workload, workload->status);
> >
> > -		if (workload->req)
> > -			i915_gem_request_put(fetch_and_zero(&workload-
> >req));
> > -
> >  		complete_current_workload(gvt, ring_id);
> >
> >  		if (need_force_wake)
> > --
> > 2.7.4
> > _______________________________________________
> > intel-gvt-dev mailing list
> > intel-gvt-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> 
> --
> Open Source Technology Center, Intel ltd.
> 
> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827