[PATCH 2/2] drm/i915/gvt: Audit and shadow workload during ELSP writing

Tue Jun 6 01:35:17 UTC 2017

On 06/06/17 09:29, Gao, Ping A wrote:
> On 2017/6/5 16:38, Wang, Zhi A wrote:
>> On 05/24/17 20:49, Ping Gao wrote:
>>> Let the workload audit and shadow ahead of vGPU scheduling, that
>>> will eliminate GPU idle time and improve performance for multi-VM.
>>>
>>> The performance of Heaven running simultaneously in 3VMs has
>>> improved 20% after this patch.
>>>
>>> Signed-off-by: Ping Gao <ping.a.gao at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gvt/execlist.c  | 12 ++++++++++++
>>>    drivers/gpu/drm/i915/gvt/scheduler.c |  7 +++++++
>>>    drivers/gpu/drm/i915/gvt/scheduler.h |  1 +
>>>    3 files changed, 20 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gvt/execlist.c b/drivers/gpu/drm/i915/gvt/execlist.c
>>> index dca989e..a14cf29 100644
>>> --- a/drivers/gpu/drm/i915/gvt/execlist.c
>>> +++ b/drivers/gpu/drm/i915/gvt/execlist.c
>>> @@ -605,6 +605,7 @@ static int submit_context(struct intel_vgpu *vgpu, int ring_id,
>>>    	struct list_head *q = workload_q_head(vgpu, ring_id);
>>>    	struct intel_vgpu_workload *last_workload = get_last_workload(q);
>>>    	struct intel_vgpu_workload *workload = NULL;
>>> +	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
>>>    	u64 ring_context_gpa;
>>>    	u32 head, tail, start, ctl, ctx_ctl, per_ctx, indirect_ctx;
>>>    	int ret;
>>> @@ -668,6 +669,7 @@ static int submit_context(struct intel_vgpu *vgpu, int ring_id,
>>>    	workload->complete = complete_execlist_workload;
>>>    	workload->status = -EINPROGRESS;
>>>    	workload->emulate_schedule_in = emulate_schedule_in;
>>> +	workload->shadowed = false;
>>>
>>>    	if (ring_id == RCS) {
>>>    		intel_gvt_hypervisor_read_gpa(vgpu, ring_context_gpa +
>>> @@ -701,6 +703,16 @@ static int submit_context(struct intel_vgpu *vgpu, int ring_id,
>>>    		return ret;
>>>    	}
>>>
>>> +	/* Only audit and shadow the first workload in the queue
>>> +	 * as there is only one pre-allocated buf-obj for shadow.
>>> +	 */
>>> +	if (vgpu->gvt->scheduler.current_vgpu != vgpu &&
>>> +				list_empty(workload_q_head(vgpu, ring_id))) {
>>> +		mutex_lock(&dev_priv->drm.struct_mutex);
>>> +		intel_gvt_audit_and_shadow_workload(workload);
>>> +		mutex_unlock(&dev_priv->drm.struct_mutex);
>>> +	}
>>> +
>> Is there any reason why a pre-shadow cannot happen when current_vgpu ==
>> vgpu and workload q is empty?
> As there is only one pre-allocated obj-buf for shadow, it's able to
> shadow the first workload only in the q. The q is empty before inqueue
> means current workload will be the first node of the q.
>
> current_vgpu!=vgpu, it try to emphasize that the performance could get
> improved only when pre-shadow happen under this condition , that's why
> we need pre-shadow. Logically it can be removed but the purpose of
> pre-shadow would be not very clear when reading the code.
I got the background in our previous talk :P . I'm just curious. From my 
point of view, the gap comes from the workload scheduler thread. If we 
pre-shadow workload as much as possible, that would be nicer. Do you 
observe performance drop/gain changed after remove current->vgpu != vgpu? :P
>>>    	queue_workload(workload);
>>>    	return 0;
>>>    }
>>> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
>>> index d6bfdfe..4261bd1 100644
>>> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
>>> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
>>> @@ -184,6 +184,11 @@ int intel_gvt_audit_and_shadow_workload(struct intel_vgpu_workload *workload)
>>>    	struct intel_ring *ring;
>>>    	int ret;
>>>
>>> +	if (workload->shadowed) {
>>> +		ret = workload->status;
>>> +		return ret;
>>> +	}
>>> +
>>>    	shadow_ctx->desc_template &= ~(0x3 << GEN8_CTX_ADDRESSING_MODE_SHIFT);
>>>    	shadow_ctx->desc_template |= workload->ctx_desc.addressing_mode <<
>>>    				    GEN8_CTX_ADDRESSING_MODE_SHIFT;
>>> @@ -230,6 +235,8 @@ int intel_gvt_audit_and_shadow_workload(struct intel_vgpu_workload *workload)
>>>    	if (ret)
>>>    		goto out;
>>>
>>> +	workload->shadowed = true;
>>> +
>>>    out:
>>>    	workload->status = ret;
>>>    	return ret;
>>> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.h b/drivers/gpu/drm/i915/gvt/scheduler.h
>>> index 2cd725c..575659e 100644
>>> --- a/drivers/gpu/drm/i915/gvt/scheduler.h
>>> +++ b/drivers/gpu/drm/i915/gvt/scheduler.h
>>> @@ -78,6 +78,7 @@ struct intel_vgpu_workload {
>>>    	struct drm_i915_gem_request *req;
>>>    	/* if this workload has been dispatched to i915? */
>>>    	bool dispatched;
>>> +	bool shadowed;
>>>    	int status;
>>>
>>>    	struct intel_vgpu_mm *shadow_mm;
>>>