[PATCH 2/2] drm/i915/gvt: Audit and shadow workload during ELSP writing

Tue Jun 6 02:38:56 UTC 2017

On 2017/6/6 9:35, Wang, Zhi A wrote:
> On 06/06/17 09:29, Gao, Ping A wrote:
>> On 2017/6/5 16:38, Wang, Zhi A wrote:
>>> On 05/24/17 20:49, Ping Gao wrote:
>>>> Let the workload audit and shadow ahead of vGPU scheduling, that
>>>> will eliminate GPU idle time and improve performance for multi-VM.
>>>>
>>>> The performance of Heaven running simultaneously in 3VMs has
>>>> improved 20% after this patch.
>>>>
>>>> Signed-off-by: Ping Gao <ping.a.gao at intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gvt/execlist.c  | 12 ++++++++++++
>>>>    drivers/gpu/drm/i915/gvt/scheduler.c |  7 +++++++
>>>>    drivers/gpu/drm/i915/gvt/scheduler.h |  1 +
>>>>    3 files changed, 20 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gvt/execlist.c b/drivers/gpu/drm/i915/gvt/execlist.c
>>>> index dca989e..a14cf29 100644
>>>> --- a/drivers/gpu/drm/i915/gvt/execlist.c
>>>> +++ b/drivers/gpu/drm/i915/gvt/execlist.c
>>>> @@ -605,6 +605,7 @@ static int submit_context(struct intel_vgpu *vgpu, int ring_id,
>>>>    	struct list_head *q = workload_q_head(vgpu, ring_id);
>>>>    	struct intel_vgpu_workload *last_workload = get_last_workload(q);
>>>>    	struct intel_vgpu_workload *workload = NULL;
>>>> +	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
>>>>    	u64 ring_context_gpa;
>>>>    	u32 head, tail, start, ctl, ctx_ctl, per_ctx, indirect_ctx;
>>>>    	int ret;
>>>> @@ -668,6 +669,7 @@ static int submit_context(struct intel_vgpu *vgpu, int ring_id,
>>>>    	workload->complete = complete_execlist_workload;
>>>>    	workload->status = -EINPROGRESS;
>>>>    	workload->emulate_schedule_in = emulate_schedule_in;
>>>> +	workload->shadowed = false;
>>>>
>>>>    	if (ring_id == RCS) {
>>>>    		intel_gvt_hypervisor_read_gpa(vgpu, ring_context_gpa +
>>>> @@ -701,6 +703,16 @@ static int submit_context(struct intel_vgpu *vgpu, int ring_id,
>>>>    		return ret;
>>>>    	}
>>>>
>>>> +	/* Only audit and shadow the first workload in the queue
>>>> +	 * as there is only one pre-allocated buf-obj for shadow.
>>>> +	 */
>>>> +	if (vgpu->gvt->scheduler.current_vgpu != vgpu &&
>>>> +				list_empty(workload_q_head(vgpu, ring_id))) {
>>>> +		mutex_lock(&dev_priv->drm.struct_mutex);
>>>> +		intel_gvt_audit_and_shadow_workload(workload);
>>>> +		mutex_unlock(&dev_priv->drm.struct_mutex);
>>>> +	}
>>>> +
>>> Is there any reason why a pre-shadow cannot happen when current_vgpu ==
>>> vgpu and workload q is empty?
>> As there is only one pre-allocated obj-buf for shadow, it's able to
>> shadow the first workload only in the q. The q is empty before inqueue
>> means current workload will be the first node of the q.
>>
>> current_vgpu!=vgpu, it try to emphasize that the performance could get
>> improved only when pre-shadow happen under this condition , that's why
>> we need pre-shadow. Logically it can be removed but the purpose of
>> pre-shadow would be not very clear when reading the code.
> I got the background in our previous talk :P . I'm just curious. From my 
> point of view, the gap comes from the workload scheduler thread. If we 
> pre-shadow workload as much as possible, that would be nicer. Do you 
> observe performance drop/gain changed after remove current->vgpu != vgpu? :P

There is no benefit to do pre-shadow if current->vgpu == vgpu, so remove
current->vgpu != vgpu has no performance impact I think.

Just notice that this condition should removed as current->vgpu==vgpu
could been changed soon due to scheduler,  it would make current vgpu
miss some chance to do pre-shadow.  I will remove it at next version :)

>>>>    	queue_workload(workload);
>>>>    	return 0;
>>>>    }
>>>> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
>>>> index d6bfdfe..4261bd1 100644
>>>> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
>>>> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
>>>> @@ -184,6 +184,11 @@ int intel_gvt_audit_and_shadow_workload(struct intel_vgpu_workload *workload)
>>>>    	struct intel_ring *ring;
>>>>    	int ret;
>>>>
>>>> +	if (workload->shadowed) {
>>>> +		ret = workload->status;
>>>> +		return ret;
>>>> +	}
>>>> +
>>>>    	shadow_ctx->desc_template &= ~(0x3 << GEN8_CTX_ADDRESSING_MODE_SHIFT);
>>>>    	shadow_ctx->desc_template |= workload->ctx_desc.addressing_mode <<
>>>>    				    GEN8_CTX_ADDRESSING_MODE_SHIFT;
>>>> @@ -230,6 +235,8 @@ int intel_gvt_audit_and_shadow_workload(struct intel_vgpu_workload *workload)
>>>>    	if (ret)
>>>>    		goto out;
>>>>
>>>> +	workload->shadowed = true;
>>>> +
>>>>    out:
>>>>    	workload->status = ret;
>>>>    	return ret;
>>>> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.h b/drivers/gpu/drm/i915/gvt/scheduler.h
>>>> index 2cd725c..575659e 100644
>>>> --- a/drivers/gpu/drm/i915/gvt/scheduler.h
>>>> +++ b/drivers/gpu/drm/i915/gvt/scheduler.h
>>>> @@ -78,6 +78,7 @@ struct intel_vgpu_workload {
>>>>    	struct drm_i915_gem_request *req;
>>>>    	/* if this workload has been dispatched to i915? */
>>>>    	bool dispatched;
>>>> +	bool shadowed;
>>>>    	int status;
>>>>
>>>>    	struct intel_vgpu_mm *shadow_mm;
>>>>