[PATCH 1/2] drm/i915/gvt: add RING_INSTDONE and SC_INSTDONE mmio handler in GVT-g

Wed May 17 06:07:57 UTC 2017

> -----Original Message-----
> From: Zhenyu Wang [mailto:zhenyuw at linux.intel.com]
> Sent: Wednesday, May 17, 2017 1:51 PM
> To: Du, Changbin <changbin.du at intel.com>
> Cc: Li, Weinan Z <weinan.z.li at intel.com>; intel-gvt-dev at lists.freedesktop.org
> Subject: Re: [PATCH 1/2] drm/i915/gvt: add RING_INSTDONE and
> SC_INSTDONE mmio handler in GVT-g
> 
> On 2017.05.17 13:49:27 +0800, Du, Changbin wrote:
> > On Wed, May 17, 2017 at 01:40:56PM +0800, Zhenyu Wang wrote:
> > > On 2017.05.17 11:22:52 +0800, Weinan Li wrote:
> > > > kernel hangcheck needs to check RING_INSTDONE and SC_INSTDONE
> registers'
> > > > state to know if hardware is still running. In GVT-g environment,
> > > > we need to emulate these registers change for all the vgpus,
> > > > otherwise if one workload runs for a long time with no ACTHD and
> > > > INSTDONE change will cause hangcheck failed then trigger gfx
> > > > reset, especially in multi-vgpus environment, one vgpu has been
> > > > scheduled out for a long time, it will try to check is there
> > > > INSTDONE registers change to know if hardware is still running.
> > > >
> > > > here we return the physical state for all the vgpus, let them know
> > > > the hardware's running state, avoid unnecessary gfx reset from vgpu.
> > > >
> > > > Signed-off-by: Weinan Li <weinan.z.li at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/gvt/handlers.c | 23 +++++++++++++++++++++++
> > > >  drivers/gpu/drm/i915/gvt/mmio.c     |  7 -------
> > > >  2 files changed, 23 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gvt/handlers.c
> > > > b/drivers/gpu/drm/i915/gvt/handlers.c
> > > > index c995e54..a70892e 100644
> > > > --- a/drivers/gpu/drm/i915/gvt/handlers.c
> > > > +++ b/drivers/gpu/drm/i915/gvt/handlers.c
> > > > @@ -1409,6 +1409,23 @@ static int ring_timestamp_mmio_read(struct
> intel_vgpu *vgpu,
> > > >  	return intel_vgpu_default_mmio_read(vgpu, offset, p_data,
> > > > bytes);  }
> > > >
> > > > +static int instdone_mmio_read(struct intel_vgpu *vgpu,
> > > > +		unsigned int offset, void *p_data, unsigned int bytes) {
> > > > +	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
> > > > +
> > > > +	if (offset == 0x206c) {
> > > > +		gvt_vgpu_err("------------------------------------------\n");
> > > > +		gvt_vgpu_err(" likely triggers a gfx reset or scheduled out for a
> long time.\n");
> > > > +		gvt_vgpu_err("------------------------------------------\n");
> > > > +		vgpu->mmio.disable_warn_untrack = true;
> > > > +	}
> > >
> > > So shouldn't remove this message completely here but might have some
> > > debug info when guest really issue reset by GDRST write?
> > >
> > Virtual GT reset log has printed in GDRST handler gdrst_mmio_write.
> > Just change the log level from debug to info if need.
> >
> 
> yeah, so I mean this is not good to show guest reset info here for INSTDONE
> handler, as it's normal to read it, can't just tell it appears to hang then.
For Windows guest it may have TDR happen, usually OS read INSTDONE(0x206C) during collect DBG information. For Linux guest it happens usually vGPU schedule out for a long time but may not have TDR. Although it's not TDR, better we leave one prompt message here to let us know "a long time schedule out or a long workload running".
> 
> --
> Open Source Technology Center, Intel ltd.
> 
> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827