[PATCH] drm/i915/gvt: Skip writing 0 to HWSP during D3 resume

Fri Sep 11 02:26:21 UTC 2020

Maybe we still need this change or a similar one. The HWSP zero out 
failure will result in to mmio write fail and guest then enters failsafe 
mode. From guest perspective, it will fail to resume and enter reset cycle.

 >>> Trigger system_wakeup in qemu monitor

[  753.165063] gvt: vgpu 2: write invalid HWSP address, reg:0x2080, 
value:0x0
[  753.165065] gvt: vgpu 2: fail to emulate MMIO write 00002080 len 4
[  753.165206] gvt: vgpu 2: write invalid HWSP address, reg:0x12080, 
value:0x0
[  753.165207] gvt: vgpu 2: fail to emulate MMIO write 00012080 len 4
[  753.165247] gvt: vgpu 2: write invalid HWSP address, reg:0x22080, 
value:0x0
[  753.165248] gvt: vgpu 2: fail to emulate MMIO write 00022080 len 4
[  753.165295] gvt: vgpu 2: write invalid HWSP address, reg:0x1a080, 
value:0x0
[  753.165297] gvt: vgpu 2: fail to emulate MMIO write 0001a080 len 4
[  753.165645] Detected your guest driver doesn't support GVT-g.
[  753.165646] Now vgpu 2 will enter failsafe mode.

On 2020-09-09 11:05, Colin Xu wrote:
>
> On 2020-09-09 10:41, Zhenyu Wang wrote:
>> On 2020.09.09 10:51:04 +0800, Colin Xu wrote:
>>> On 2020-09-09 10:04, Zhenyu Wang wrote:
>>>> On 2020.09.09 08:37:53 +0800, Colin Xu wrote:
>>>>> On 2020-09-08 17:15, Zhenyu Wang wrote:
>>>>>> On 2020.08.19 09:09:53 +0800, Colin Xu wrote:
>>>>>>> Guest driver may reset HWSP to 0 as init value during D3->D0:
>>>>>>> The full sequence is:
>>>>>>> - Boot ->D0
>>>>>>> - Update HWSP
>>>>>>> - D0->D3
>>>>>>> - ...In D3 state...
>>>>>>> - D3->D0
>>>>>>> - DMLR reset.
>>>>>>> - Set engine HWSP to 0.
>>>>>>> - Set engine ring mode to 0.
>>>>>>> - Set engine HWSP to correct value.
>>>>>>> - Set engine ring mode to correct value.
>>>>>>> Ring mode is masked register so set 0 won't take effect.
>>>>>>> However HWPS addr 0 is considered as invalid GGTT address which 
>>>>>>> will
>>>>>>> report error like:
>>>>>>> gvt: vgpu 1: write invalid HWSP address, reg:0x2080, value:0x0
>>>>>>>
>>>>>> So looks this is to handle that wrong error message but function is
>>>>>> still same, right?
>>>>> Yes that's right. Current logic can guarantee that only correct 
>>>>> HWSP write
>>>>> from guest are accepted by GVT. Invalid HWSP address will be 
>>>>> dropped and
>>>>> won't be submitted to HW. During resume, correct HWSP will be updated
>>>>> eventually updated. If pv_notified is set, the intermediate HWSP 
>>>>> zero-out
>>>>> can be ignored until hit next reset.
>>>> Might just move that error into debug message instead of putting
>>>> effort on tracking guest state without much gain..
>>> Sometimes such invalid HWSP addr write msg may still be helpful. If 
>>> put them
>>> into gvt_dbg_mmio without considering different case, may miss the
>>> information. So this change put real error into gvt_vgpu_err, but put
>>> expected zero out to gvt_dbg_mmio.
>> Could you elaborate what would be helpful? The current error message
>> doesn't really help end user I think, but only for developer, so it
>> makes more sense to turn into debug.
>
> For developer or user consideration, yes distinguish the msg is 
> useless for end user. Developer can tell the state and find out the 
> HWSP addr is really valid or just zero-out during resuming.
>
> If so I think we can move this msg to gvt_dbg in next cleanup.
>
>>>>>>> During vGPU in D3, per-engine HWSP gpa remains valid so we can skip
>>>>>>> update HWSP in this case.
>>>>>>> Check both pv_notified and previous engine HWSP gpa, if pv already
>>>>>>> notified and previous HWSP gpa is valid, we skip this HWSP init and
>>>>>>> let later HWSP write update the correct value. We also need zero 
>>>>>>> out
>>>>>>> per-engine HWSP gpa on engine reset to make sure hws_pga is valid.
>>>>>>>
>>>>>>> Signed-off-by: Colin Xu <colin.xu at intel.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/i915/gvt/handlers.c | 30 
>>>>>>> ++++++++++++++++++++---------
>>>>>>>     drivers/gpu/drm/i915/gvt/vgpu.c     |  7 +++++++
>>>>>>>     2 files changed, 28 insertions(+), 9 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gvt/handlers.c 
>>>>>>> b/drivers/gpu/drm/i915/gvt/handlers.c
>>>>>>> index 840572add2d4..72860aaf1656 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gvt/handlers.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gvt/handlers.c
>>>>>>> @@ -1489,12 +1489,6 @@ static int hws_pga_write(struct 
>>>>>>> intel_vgpu *vgpu, unsigned int offset,
>>>>>>>         const struct intel_engine_cs *engine =
>>>>>>> intel_gvt_render_mmio_to_engine(vgpu->gvt, offset);
>>>>>>> -    if (!intel_gvt_ggtt_validate_range(vgpu, value, 
>>>>>>> I915_GTT_PAGE_SIZE)) {
>>>>>>> -        gvt_vgpu_err("write invalid HWSP address, reg:0x%x, 
>>>>>>> value:0x%x\n",
>>>>>>> -                  offset, value);
>>>>>>> -        return -EINVAL;
>>>>>>> -    }
>>>>>>> -
>>>>>>>         /*
>>>>>>>          * Need to emulate all the HWSP register write to ensure 
>>>>>>> host can
>>>>>>>          * update the VM CSB status correctly. Here listed 
>>>>>>> registers can
>>>>>>> @@ -1505,9 +1499,27 @@ static int hws_pga_write(struct 
>>>>>>> intel_vgpu *vgpu, unsigned int offset,
>>>>>>>                      offset);
>>>>>>>             return -EINVAL;
>>>>>>>         }
>>>>>>> -    vgpu->hws_pga[engine->id] = value;
>>>>>>> -    gvt_dbg_mmio("VM(%d) write: 0x%x to HWSP: 0x%x\n",
>>>>>>> -             vgpu->id, value, offset);
>>>>>>> +
>>>>>>> +    if (!intel_gvt_ggtt_validate_range(vgpu, value, 
>>>>>>> I915_GTT_PAGE_SIZE)) {
>>>>>>> +        u32 old = vgpu->hws_pga[engine->id];
>>>>>>> +
>>>>>>> +        /* Skip zero out RING_HWS_PGA during D3 resume */
>>>>>>> +        if (vgpu->pv_notified && value == 0 &&
>>>>>>> +            intel_gvt_ggtt_validate_range(vgpu, old,
>>>>>>> +                          I915_GTT_PAGE_SIZE)) {
>>>>>>> +            gvt_dbg_mmio("Skip zero out HWSP address, reg:0x%x, "
>>>>>>> +                     "value:0x%x\n", offset, value);
>>>>>>> +
>>>>>>> +        } else {
>>>>>>> +            gvt_vgpu_err("write invalid HWSP address, reg:0x%x, "
>>>>>>> +                     "value:0x%x\n", offset, value);
>>>>>>> +            return -EINVAL;
>>>>>>> +        }
>>>>>>> +    } else {
>>>>>>> +        vgpu->hws_pga[engine->id] = value;
>>>>>>> +        gvt_dbg_mmio("VM(%d) write: 0x%x to HWSP: 0x%x\n",
>>>>>>> +                 vgpu->id, value, offset);
>>>>>>> +    }
>>>>>>>         return intel_vgpu_default_mmio_write(vgpu, offset, 
>>>>>>> &value, bytes);
>>>>>>>     }
>>>>>>> diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c 
>>>>>>> b/drivers/gpu/drm/i915/gvt/vgpu.c
>>>>>>> index 8fa9b31a2484..e0e073045d83 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gvt/vgpu.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gvt/vgpu.c
>>>>>>> @@ -558,6 +558,9 @@ void intel_gvt_reset_vgpu_locked(struct 
>>>>>>> intel_vgpu *vgpu, bool dmlr,
>>>>>>>         intel_vgpu_reset_submission(vgpu, resetting_eng);
>>>>>>>         /* full GPU reset or device model level reset */
>>>>>>>         if (engine_mask == ALL_ENGINES || dmlr) {
>>>>>>> +        struct intel_engine_cs *engine;
>>>>>>> +        intel_engine_mask_t tmp;
>>>>>>> +
>>>>>>>             intel_vgpu_select_submission_ops(vgpu, ALL_ENGINES, 0);
>>>>>>>             if (engine_mask == ALL_ENGINES)
>>>>>>>                 intel_vgpu_invalidate_ppgtt(vgpu);
>>>>>>> @@ -588,6 +591,10 @@ void intel_gvt_reset_vgpu_locked(struct 
>>>>>>> intel_vgpu *vgpu, bool dmlr,
>>>>>>>                 else
>>>>>>>                     vgpu->pv_notified = false;
>>>>>>>             }
>>>>>>> +
>>>>>>> +        for_each_engine_masked(engine, gvt->gt, engine_mask, 
>>>>>>> tmp) {
>>>>>>> +            vgpu->hws_pga[engine->id] = 0;
>>>>>>> +        }
>>>>>>>         }
>>>>>>>         vgpu->resetting_eng = 0;
>>>>>>> -- 
>>>>>>> 2.28.0
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> intel-gvt-dev mailing list
>>>>>>> intel-gvt-dev at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
>>>>> -- 
>>>>> Best Regards,
>>>>> Colin Xu
>>>>>
>>> -- 
>>> Best Regards,
>>> Colin Xu
>>>
-- 
Best Regards,
Colin Xu