[PATCH] drm/i915/gvt: Skip writing 0 to HWSP during D3 resume

Colin Xu Colin.Xu at intel.com
Fri Sep 11 03:41:31 UTC 2020


On 2020-09-11 10:44, Zhenyu Wang wrote:
> On 2020.09.11 10:26:21 +0800, Colin Xu wrote:
>> Maybe we still need this change or a similar one. The HWSP zero out failure
>> will result in to mmio write fail and guest then enters failsafe mode. From
>> guest perspective, it will fail to resume and enter reset cycle.
>>
>>
>>>>> Trigger system_wakeup in qemu monitor
>> [  753.165063] gvt: vgpu 2: write invalid HWSP address, reg:0x2080,
>> value:0x0
>> [  753.165065] gvt: vgpu 2: fail to emulate MMIO write 00002080 len 4
>> [  753.165206] gvt: vgpu 2: write invalid HWSP address, reg:0x12080,
>> value:0x0
>> [  753.165207] gvt: vgpu 2: fail to emulate MMIO write 00012080 len 4
>> [  753.165247] gvt: vgpu 2: write invalid HWSP address, reg:0x22080,
>> value:0x0
>> [  753.165248] gvt: vgpu 2: fail to emulate MMIO write 00022080 len 4
>> [  753.165295] gvt: vgpu 2: write invalid HWSP address, reg:0x1a080,
>> value:0x0
>> [  753.165297] gvt: vgpu 2: fail to emulate MMIO write 0001a080 len 4
>> [  753.165645] Detected your guest driver doesn't support GVT-g.
>> [  753.165646] Now vgpu 2 will enter failsafe mode.
>>
> Maybe we just take that 0 address as special case and not return
> failure, as it's a valid setting from device driver point of view.
It makes sense. I'll cook a patch to allow 0 addr, and also move the err 
msg to dbg.
>
>> On 2020-09-09 11:05, Colin Xu wrote:
>>> On 2020-09-09 10:41, Zhenyu Wang wrote:
>>>> On 2020.09.09 10:51:04 +0800, Colin Xu wrote:
>>>>> On 2020-09-09 10:04, Zhenyu Wang wrote:
>>>>>> On 2020.09.09 08:37:53 +0800, Colin Xu wrote:
>>>>>>> On 2020-09-08 17:15, Zhenyu Wang wrote:
>>>>>>>> On 2020.08.19 09:09:53 +0800, Colin Xu wrote:
>>>>>>>>> Guest driver may reset HWSP to 0 as init value during D3->D0:
>>>>>>>>> The full sequence is:
>>>>>>>>> - Boot ->D0
>>>>>>>>> - Update HWSP
>>>>>>>>> - D0->D3
>>>>>>>>> - ...In D3 state...
>>>>>>>>> - D3->D0
>>>>>>>>> - DMLR reset.
>>>>>>>>> - Set engine HWSP to 0.
>>>>>>>>> - Set engine ring mode to 0.
>>>>>>>>> - Set engine HWSP to correct value.
>>>>>>>>> - Set engine ring mode to correct value.
>>>>>>>>> Ring mode is masked register so set 0 won't take effect.
>>>>>>>>> However HWPS addr 0 is considered as invalid
>>>>>>>>> GGTT address which will
>>>>>>>>> report error like:
>>>>>>>>> gvt: vgpu 1: write invalid HWSP address, reg:0x2080, value:0x0
>>>>>>>>>
>>>>>>>> So looks this is to handle that wrong error message but function is
>>>>>>>> still same, right?
>>>>>>> Yes that's right. Current logic can guarantee that only
>>>>>>> correct HWSP write
>>>>>>> from guest are accepted by GVT. Invalid HWSP address
>>>>>>> will be dropped and
>>>>>>> won't be submitted to HW. During resume, correct HWSP will be updated
>>>>>>> eventually updated. If pv_notified is set, the
>>>>>>> intermediate HWSP zero-out
>>>>>>> can be ignored until hit next reset.
>>>>>> Might just move that error into debug message instead of putting
>>>>>> effort on tracking guest state without much gain..
>>>>> Sometimes such invalid HWSP addr write msg may still be helpful.
>>>>> If put them
>>>>> into gvt_dbg_mmio without considering different case, may miss the
>>>>> information. So this change put real error into gvt_vgpu_err, but put
>>>>> expected zero out to gvt_dbg_mmio.
>>>> Could you elaborate what would be helpful? The current error message
>>>> doesn't really help end user I think, but only for developer, so it
>>>> makes more sense to turn into debug.
>>> For developer or user consideration, yes distinguish the msg is useless
>>> for end user. Developer can tell the state and find out the HWSP addr is
>>> really valid or just zero-out during resuming.
>>>
>>> If so I think we can move this msg to gvt_dbg in next cleanup.
>>>
>>>>>>>>> During vGPU in D3, per-engine HWSP gpa remains valid so we can skip
>>>>>>>>> update HWSP in this case.
>>>>>>>>> Check both pv_notified and previous engine HWSP gpa, if pv already
>>>>>>>>> notified and previous HWSP gpa is valid, we skip this HWSP init and
>>>>>>>>> let later HWSP write update the correct value.
>>>>>>>>> We also need zero out
>>>>>>>>> per-engine HWSP gpa on engine reset to make sure hws_pga is valid.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Colin Xu <colin.xu at intel.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/gpu/drm/i915/gvt/handlers.c | 30
>>>>>>>>> ++++++++++++++++++++---------
>>>>>>>>>      drivers/gpu/drm/i915/gvt/vgpu.c     |  7 +++++++
>>>>>>>>>      2 files changed, 28 insertions(+), 9 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gvt/handlers.c
>>>>>>>>> b/drivers/gpu/drm/i915/gvt/handlers.c
>>>>>>>>> index 840572add2d4..72860aaf1656 100644
>>>>>>>>> --- a/drivers/gpu/drm/i915/gvt/handlers.c
>>>>>>>>> +++ b/drivers/gpu/drm/i915/gvt/handlers.c
>>>>>>>>> @@ -1489,12 +1489,6 @@ static int
>>>>>>>>> hws_pga_write(struct intel_vgpu *vgpu, unsigned
>>>>>>>>> int offset,
>>>>>>>>>          const struct intel_engine_cs *engine =
>>>>>>>>> intel_gvt_render_mmio_to_engine(vgpu->gvt, offset);
>>>>>>>>> -    if (!intel_gvt_ggtt_validate_range(vgpu,
>>>>>>>>> value, I915_GTT_PAGE_SIZE)) {
>>>>>>>>> -        gvt_vgpu_err("write invalid HWSP
>>>>>>>>> address, reg:0x%x, value:0x%x\n",
>>>>>>>>> -                  offset, value);
>>>>>>>>> -        return -EINVAL;
>>>>>>>>> -    }
>>>>>>>>> -
>>>>>>>>>          /*
>>>>>>>>>           * Need to emulate all the HWSP register
>>>>>>>>> write to ensure host can
>>>>>>>>>           * update the VM CSB status correctly.
>>>>>>>>> Here listed registers can
>>>>>>>>> @@ -1505,9 +1499,27 @@ static int
>>>>>>>>> hws_pga_write(struct intel_vgpu *vgpu, unsigned
>>>>>>>>> int offset,
>>>>>>>>>                       offset);
>>>>>>>>>              return -EINVAL;
>>>>>>>>>          }
>>>>>>>>> -    vgpu->hws_pga[engine->id] = value;
>>>>>>>>> -    gvt_dbg_mmio("VM(%d) write: 0x%x to HWSP: 0x%x\n",
>>>>>>>>> -             vgpu->id, value, offset);
>>>>>>>>> +
>>>>>>>>> +    if (!intel_gvt_ggtt_validate_range(vgpu,
>>>>>>>>> value, I915_GTT_PAGE_SIZE)) {
>>>>>>>>> +        u32 old = vgpu->hws_pga[engine->id];
>>>>>>>>> +
>>>>>>>>> +        /* Skip zero out RING_HWS_PGA during D3 resume */
>>>>>>>>> +        if (vgpu->pv_notified && value == 0 &&
>>>>>>>>> +            intel_gvt_ggtt_validate_range(vgpu, old,
>>>>>>>>> +                          I915_GTT_PAGE_SIZE)) {
>>>>>>>>> +            gvt_dbg_mmio("Skip zero out HWSP address, reg:0x%x, "
>>>>>>>>> +                     "value:0x%x\n", offset, value);
>>>>>>>>> +
>>>>>>>>> +        } else {
>>>>>>>>> +            gvt_vgpu_err("write invalid HWSP address, reg:0x%x, "
>>>>>>>>> +                     "value:0x%x\n", offset, value);
>>>>>>>>> +            return -EINVAL;
>>>>>>>>> +        }
>>>>>>>>> +    } else {
>>>>>>>>> +        vgpu->hws_pga[engine->id] = value;
>>>>>>>>> +        gvt_dbg_mmio("VM(%d) write: 0x%x to HWSP: 0x%x\n",
>>>>>>>>> +                 vgpu->id, value, offset);
>>>>>>>>> +    }
>>>>>>>>>          return
>>>>>>>>> intel_vgpu_default_mmio_write(vgpu, offset,
>>>>>>>>> &value, bytes);
>>>>>>>>>      }
>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c
>>>>>>>>> b/drivers/gpu/drm/i915/gvt/vgpu.c
>>>>>>>>> index 8fa9b31a2484..e0e073045d83 100644
>>>>>>>>> --- a/drivers/gpu/drm/i915/gvt/vgpu.c
>>>>>>>>> +++ b/drivers/gpu/drm/i915/gvt/vgpu.c
>>>>>>>>> @@ -558,6 +558,9 @@ void
>>>>>>>>> intel_gvt_reset_vgpu_locked(struct intel_vgpu
>>>>>>>>> *vgpu, bool dmlr,
>>>>>>>>>          intel_vgpu_reset_submission(vgpu, resetting_eng);
>>>>>>>>>          /* full GPU reset or device model level reset */
>>>>>>>>>          if (engine_mask == ALL_ENGINES || dmlr) {
>>>>>>>>> +        struct intel_engine_cs *engine;
>>>>>>>>> +        intel_engine_mask_t tmp;
>>>>>>>>> +
>>>>>>>>>              intel_vgpu_select_submission_ops(vgpu, ALL_ENGINES, 0);
>>>>>>>>>              if (engine_mask == ALL_ENGINES)
>>>>>>>>>                  intel_vgpu_invalidate_ppgtt(vgpu);
>>>>>>>>> @@ -588,6 +591,10 @@ void
>>>>>>>>> intel_gvt_reset_vgpu_locked(struct intel_vgpu
>>>>>>>>> *vgpu, bool dmlr,
>>>>>>>>>                  else
>>>>>>>>>                      vgpu->pv_notified = false;
>>>>>>>>>              }
>>>>>>>>> +
>>>>>>>>> +        for_each_engine_masked(engine, gvt->gt,
>>>>>>>>> engine_mask, tmp) {
>>>>>>>>> +            vgpu->hws_pga[engine->id] = 0;
>>>>>>>>> +        }
>>>>>>>>>          }
>>>>>>>>>          vgpu->resetting_eng = 0;
>>>>>>>>> -- 
>>>>>>>>> 2.28.0
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> intel-gvt-dev mailing list
>>>>>>>>> intel-gvt-dev at lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
>>>>>>> -- 
>>>>>>> Best Regards,
>>>>>>> Colin Xu
>>>>>>>
>>>>> -- 
>>>>> Best Regards,
>>>>> Colin Xu
>>>>>
>> -- 
>> Best Regards,
>> Colin Xu
>>
-- 
Best Regards,
Colin Xu



More information about the intel-gvt-dev mailing list