[Intel-gfx] [PATCH v7] drm/i915: Emit to ringbuffer directly

Thu Feb 9 13:35:11 UTC 2017

On Thu, Feb 09, 2017 at 09:02:06AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> This removes the usage of intel_ring_emit in favour of
> directly writing to the ring buffer.
> 
> intel_ring_emit was preventing the compiler for optimising
> fetch and increment of the current ring buffer pointer and
> therefore generating very verbose code for every write.
> 
> It had no useful purpose since all ringbuffer operations
> are started and ended with intel_ring_begin and
> intel_ring_advance respectively, with no bail out in the
> middle possible, so it is fine to increment the tail in
> intel_ring_begin and let the code manage the pointer
> itself.
> 
> Useless instruction removal amounts to approximately
> two and half kilobytes of saved text on my build.
> 
> Not sure if this has any measurable performance
> implications but executing a ton of useless instructions
> on fast paths cannot be good.
> 
> Patch is not fully polished, but it compiles and runs
> on Gen9 at least.
> 
> v2:
>  * Change return from intel_ring_begin to error pointer by
>    popular demand.
>  * Move tail increment to intel_ring_advance to enable some
>    error checking.
> 
> v3:
>  * Move tail advance back into intel_ring_begin.
>  * Rebase and tidy.
> 
> v4:
>  * Complete rebase after a few months since v3.
> 
> v5:
>  * Remove unecessary cast and fix !debug compile. (Chris Wilson)
> 
> v6:
>  * Make intel_ring_offset take request as well.
>  * Fix recording of request postfix plus a sprinkle of asserts.
>    (Chris Wilson)
> 
> v7:
>  * Use intel_ring_offset to get the postfix. (Chris Wilson)
>  * Convert GVT code as well.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang at intel.com>
> ---
> 
> Zhi, could you please check that they way I handled the GVT
> code makes sense.
> 
> I changed copy_gma_to_hva to return error or number of bytes
> copied since that makes it easier for the new flavour of
> intel_ring_begin and intel_ring_advance to work.
> 
> I was only able to compile test it.
> 
> Thanks,
> 
> Tvrtko
> ---
> @@ -2608,36 +2605,33 @@ static int shadow_workload_ring_buffer(struct intel_vgpu_workload *workload)
>  	gma_top = workload->rb_start + guest_rb_size;
>  
>  	/* allocate shadow ring buffer */
> -	ret = intel_ring_begin(workload->req, workload->rb_len / 4);
> -	if (ret)
> -		return ret;
> +	out = intel_ring_begin(workload->req, workload->rb_len / sizeof(u32));
> +	if (IS_ERR(out))
> +		return PTR_ERR(out);
>  
>  	/* get shadow ring buffer va */
> -	workload->shadow_ring_buffer_va = ring->vaddr + ring->tail;
> +	workload->shadow_ring_buffer_va = out;
>  
>  	/* head > tail --> copy head <-> top */
>  	if (gma_head > gma_tail) {
>  		ret = copy_gma_to_hva(vgpu, vgpu->gtt.ggtt_mm,
> -				gma_head, gma_top,
> -				workload->shadow_ring_buffer_va);
> -		if (ret) {
> +				      gma_head, gma_top, out);
> +		if (ret < 0) {
>  			gvt_err("fail to copy guest ring buffer\n");
>  			return ret;
>  		}
> -		copy_len = gma_top - gma_head;
> +		out += ret / sizeof(u32);
>  		gma_head = workload->rb_start;
>  	}
>  
>  	/* copy head or start <-> tail */
> -	ret = copy_gma_to_hva(vgpu, vgpu->gtt.ggtt_mm,
> -			gma_head, gma_tail,
> -			workload->shadow_ring_buffer_va + copy_len);
> -	if (ret) {
> +	ret = copy_gma_to_hva(vgpu, vgpu->gtt.ggtt_mm, gma_head, gma_tail, out);

I would have done
	out = copy_gma_to_have(... out);

Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre