[PATCH] drm/i915/gvt: Skip writing 0 to HWSP during D3 resume
Zhenyu Wang
zhenyuw at linux.intel.com
Fri Sep 11 02:44:35 UTC 2020
On 2020.09.11 10:26:21 +0800, Colin Xu wrote:
> Maybe we still need this change or a similar one. The HWSP zero out failure
> will result in to mmio write fail and guest then enters failsafe mode. From
> guest perspective, it will fail to resume and enter reset cycle.
>
>
> >>> Trigger system_wakeup in qemu monitor
>
> [ 753.165063] gvt: vgpu 2: write invalid HWSP address, reg:0x2080,
> value:0x0
> [ 753.165065] gvt: vgpu 2: fail to emulate MMIO write 00002080 len 4
> [ 753.165206] gvt: vgpu 2: write invalid HWSP address, reg:0x12080,
> value:0x0
> [ 753.165207] gvt: vgpu 2: fail to emulate MMIO write 00012080 len 4
> [ 753.165247] gvt: vgpu 2: write invalid HWSP address, reg:0x22080,
> value:0x0
> [ 753.165248] gvt: vgpu 2: fail to emulate MMIO write 00022080 len 4
> [ 753.165295] gvt: vgpu 2: write invalid HWSP address, reg:0x1a080,
> value:0x0
> [ 753.165297] gvt: vgpu 2: fail to emulate MMIO write 0001a080 len 4
> [ 753.165645] Detected your guest driver doesn't support GVT-g.
> [ 753.165646] Now vgpu 2 will enter failsafe mode.
>
Maybe we just take that 0 address as special case and not return
failure, as it's a valid setting from device driver point of view.
>
> On 2020-09-09 11:05, Colin Xu wrote:
> >
> > On 2020-09-09 10:41, Zhenyu Wang wrote:
> > > On 2020.09.09 10:51:04 +0800, Colin Xu wrote:
> > > > On 2020-09-09 10:04, Zhenyu Wang wrote:
> > > > > On 2020.09.09 08:37:53 +0800, Colin Xu wrote:
> > > > > > On 2020-09-08 17:15, Zhenyu Wang wrote:
> > > > > > > On 2020.08.19 09:09:53 +0800, Colin Xu wrote:
> > > > > > > > Guest driver may reset HWSP to 0 as init value during D3->D0:
> > > > > > > > The full sequence is:
> > > > > > > > - Boot ->D0
> > > > > > > > - Update HWSP
> > > > > > > > - D0->D3
> > > > > > > > - ...In D3 state...
> > > > > > > > - D3->D0
> > > > > > > > - DMLR reset.
> > > > > > > > - Set engine HWSP to 0.
> > > > > > > > - Set engine ring mode to 0.
> > > > > > > > - Set engine HWSP to correct value.
> > > > > > > > - Set engine ring mode to correct value.
> > > > > > > > Ring mode is masked register so set 0 won't take effect.
> > > > > > > > However HWPS addr 0 is considered as invalid
> > > > > > > > GGTT address which will
> > > > > > > > report error like:
> > > > > > > > gvt: vgpu 1: write invalid HWSP address, reg:0x2080, value:0x0
> > > > > > > >
> > > > > > > So looks this is to handle that wrong error message but function is
> > > > > > > still same, right?
> > > > > > Yes that's right. Current logic can guarantee that only
> > > > > > correct HWSP write
> > > > > > from guest are accepted by GVT. Invalid HWSP address
> > > > > > will be dropped and
> > > > > > won't be submitted to HW. During resume, correct HWSP will be updated
> > > > > > eventually updated. If pv_notified is set, the
> > > > > > intermediate HWSP zero-out
> > > > > > can be ignored until hit next reset.
> > > > > Might just move that error into debug message instead of putting
> > > > > effort on tracking guest state without much gain..
> > > > Sometimes such invalid HWSP addr write msg may still be helpful.
> > > > If put them
> > > > into gvt_dbg_mmio without considering different case, may miss the
> > > > information. So this change put real error into gvt_vgpu_err, but put
> > > > expected zero out to gvt_dbg_mmio.
> > > Could you elaborate what would be helpful? The current error message
> > > doesn't really help end user I think, but only for developer, so it
> > > makes more sense to turn into debug.
> >
> > For developer or user consideration, yes distinguish the msg is useless
> > for end user. Developer can tell the state and find out the HWSP addr is
> > really valid or just zero-out during resuming.
> >
> > If so I think we can move this msg to gvt_dbg in next cleanup.
> >
> > > > > > > > During vGPU in D3, per-engine HWSP gpa remains valid so we can skip
> > > > > > > > update HWSP in this case.
> > > > > > > > Check both pv_notified and previous engine HWSP gpa, if pv already
> > > > > > > > notified and previous HWSP gpa is valid, we skip this HWSP init and
> > > > > > > > let later HWSP write update the correct value.
> > > > > > > > We also need zero out
> > > > > > > > per-engine HWSP gpa on engine reset to make sure hws_pga is valid.
> > > > > > > >
> > > > > > > > Signed-off-by: Colin Xu <colin.xu at intel.com>
> > > > > > > > ---
> > > > > > > > drivers/gpu/drm/i915/gvt/handlers.c | 30
> > > > > > > > ++++++++++++++++++++---------
> > > > > > > > drivers/gpu/drm/i915/gvt/vgpu.c | 7 +++++++
> > > > > > > > 2 files changed, 28 insertions(+), 9 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/i915/gvt/handlers.c
> > > > > > > > b/drivers/gpu/drm/i915/gvt/handlers.c
> > > > > > > > index 840572add2d4..72860aaf1656 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/gvt/handlers.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/gvt/handlers.c
> > > > > > > > @@ -1489,12 +1489,6 @@ static int
> > > > > > > > hws_pga_write(struct intel_vgpu *vgpu, unsigned
> > > > > > > > int offset,
> > > > > > > > const struct intel_engine_cs *engine =
> > > > > > > > intel_gvt_render_mmio_to_engine(vgpu->gvt, offset);
> > > > > > > > - if (!intel_gvt_ggtt_validate_range(vgpu,
> > > > > > > > value, I915_GTT_PAGE_SIZE)) {
> > > > > > > > - gvt_vgpu_err("write invalid HWSP
> > > > > > > > address, reg:0x%x, value:0x%x\n",
> > > > > > > > - offset, value);
> > > > > > > > - return -EINVAL;
> > > > > > > > - }
> > > > > > > > -
> > > > > > > > /*
> > > > > > > > * Need to emulate all the HWSP register
> > > > > > > > write to ensure host can
> > > > > > > > * update the VM CSB status correctly.
> > > > > > > > Here listed registers can
> > > > > > > > @@ -1505,9 +1499,27 @@ static int
> > > > > > > > hws_pga_write(struct intel_vgpu *vgpu, unsigned
> > > > > > > > int offset,
> > > > > > > > offset);
> > > > > > > > return -EINVAL;
> > > > > > > > }
> > > > > > > > - vgpu->hws_pga[engine->id] = value;
> > > > > > > > - gvt_dbg_mmio("VM(%d) write: 0x%x to HWSP: 0x%x\n",
> > > > > > > > - vgpu->id, value, offset);
> > > > > > > > +
> > > > > > > > + if (!intel_gvt_ggtt_validate_range(vgpu,
> > > > > > > > value, I915_GTT_PAGE_SIZE)) {
> > > > > > > > + u32 old = vgpu->hws_pga[engine->id];
> > > > > > > > +
> > > > > > > > + /* Skip zero out RING_HWS_PGA during D3 resume */
> > > > > > > > + if (vgpu->pv_notified && value == 0 &&
> > > > > > > > + intel_gvt_ggtt_validate_range(vgpu, old,
> > > > > > > > + I915_GTT_PAGE_SIZE)) {
> > > > > > > > + gvt_dbg_mmio("Skip zero out HWSP address, reg:0x%x, "
> > > > > > > > + "value:0x%x\n", offset, value);
> > > > > > > > +
> > > > > > > > + } else {
> > > > > > > > + gvt_vgpu_err("write invalid HWSP address, reg:0x%x, "
> > > > > > > > + "value:0x%x\n", offset, value);
> > > > > > > > + return -EINVAL;
> > > > > > > > + }
> > > > > > > > + } else {
> > > > > > > > + vgpu->hws_pga[engine->id] = value;
> > > > > > > > + gvt_dbg_mmio("VM(%d) write: 0x%x to HWSP: 0x%x\n",
> > > > > > > > + vgpu->id, value, offset);
> > > > > > > > + }
> > > > > > > > return
> > > > > > > > intel_vgpu_default_mmio_write(vgpu, offset,
> > > > > > > > &value, bytes);
> > > > > > > > }
> > > > > > > > diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c
> > > > > > > > b/drivers/gpu/drm/i915/gvt/vgpu.c
> > > > > > > > index 8fa9b31a2484..e0e073045d83 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/gvt/vgpu.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/gvt/vgpu.c
> > > > > > > > @@ -558,6 +558,9 @@ void
> > > > > > > > intel_gvt_reset_vgpu_locked(struct intel_vgpu
> > > > > > > > *vgpu, bool dmlr,
> > > > > > > > intel_vgpu_reset_submission(vgpu, resetting_eng);
> > > > > > > > /* full GPU reset or device model level reset */
> > > > > > > > if (engine_mask == ALL_ENGINES || dmlr) {
> > > > > > > > + struct intel_engine_cs *engine;
> > > > > > > > + intel_engine_mask_t tmp;
> > > > > > > > +
> > > > > > > > intel_vgpu_select_submission_ops(vgpu, ALL_ENGINES, 0);
> > > > > > > > if (engine_mask == ALL_ENGINES)
> > > > > > > > intel_vgpu_invalidate_ppgtt(vgpu);
> > > > > > > > @@ -588,6 +591,10 @@ void
> > > > > > > > intel_gvt_reset_vgpu_locked(struct intel_vgpu
> > > > > > > > *vgpu, bool dmlr,
> > > > > > > > else
> > > > > > > > vgpu->pv_notified = false;
> > > > > > > > }
> > > > > > > > +
> > > > > > > > + for_each_engine_masked(engine, gvt->gt,
> > > > > > > > engine_mask, tmp) {
> > > > > > > > + vgpu->hws_pga[engine->id] = 0;
> > > > > > > > + }
> > > > > > > > }
> > > > > > > > vgpu->resetting_eng = 0;
> > > > > > > > --
> > > > > > > > 2.28.0
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > intel-gvt-dev mailing list
> > > > > > > > intel-gvt-dev at lists.freedesktop.org
> > > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > > > --
> > > > > > Best Regards,
> > > > > > Colin Xu
> > > > > >
> > > > --
> > > > Best Regards,
> > > > Colin Xu
> > > >
> --
> Best Regards,
> Colin Xu
>
--
$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gvt-dev/attachments/20200911/2d000983/attachment-0001.sig>
More information about the intel-gvt-dev
mailing list