[Intel-gfx] [PATCH 3/3] drm/i915: Micro-optimise hotpath through intel_ring_begin()

Mika Kuoppala mika.kuoppala at linux.intel.com
Thu May 4 12:59:05 UTC 2017


Chris Wilson <chris at chris-wilson.co.uk> writes:

> On Thu, May 04, 2017 at 03:11:45PM +0300, Mika Kuoppala wrote:
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > Typically, there is space available within the ring and if not we have
>> > to wait (by definition a slow path). Rearrange the code to reduce the
>> > number of branches and stack size for the hotpath, accomodating a slight
>> > growth for the wait.
>> >
>> > v2: Fix the new assert that packets are not larger than the actual ring.
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++++----------------
>> >  1 file changed, 33 insertions(+), 30 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > index c46e5439d379..53123c1cfcc5 100644
>> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > @@ -1654,7 +1654,7 @@ static int ring_request_alloc(struct drm_i915_gem_request *request)
>> >  	return 0;
>> >  }
>> >  
>> > -static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>> > +static noinline int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>> >  {
>> >  	struct intel_ring *ring = req->ring;
>> >  	struct drm_i915_gem_request *target;
>> > @@ -1702,49 +1702,52 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>> >  u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
>> >  {
>> >  	struct intel_ring *ring = req->ring;
>> > -	int remain_actual = ring->size - ring->emit;
>> > -	int remain_usable = ring->effective_size - ring->emit;
>> > -	int bytes = num_dwords * sizeof(u32);
>> > -	int total_bytes, wait_bytes;
>> > -	bool need_wrap = false;
>> > +	const unsigned int remain_usable = ring->effective_size - ring->emit;
>> > +	const unsigned int bytes = num_dwords * sizeof(u32);
>> > +	unsigned int need_wrap = 0;
>> > +	unsigned int total_bytes;
>> >  	u32 *cs;
>> >  
>> >  	total_bytes = bytes + req->reserved_space;
>> > +	GEM_BUG_ON(total_bytes > ring->effective_size);
>> >  
>> > -	if (unlikely(bytes > remain_usable)) {
>> > -		/*
>> > -		 * Not enough space for the basic request. So need to flush
>> > -		 * out the remainder and then wait for base + reserved.
>> > -		 */
>> > -		wait_bytes = remain_actual + total_bytes;
>> > -		need_wrap = true;
>> > -	} else if (unlikely(total_bytes > remain_usable)) {
>> > -		/*
>> > -		 * The base request will fit but the reserved space
>> > -		 * falls off the end. So we don't need an immediate wrap
>> > -		 * and only need to effectively wait for the reserved
>> > -		 * size space from the start of ringbuffer.
>> > -		 */
>> > -		wait_bytes = remain_actual + req->reserved_space;
>> > -	} else {
>> > -		/* No wrapping required, just waiting. */
>> > -		wait_bytes = total_bytes;
>> > +	if (unlikely(total_bytes > remain_usable)) {
>> > +		const int remain_actual = ring->size - ring->emit;
>> > +
>> > +		if (bytes > remain_usable) {
>> > +			/*
>> > +			 * Not enough space for the basic request. So need to
>> > +			 * flush out the remainder and then wait for
>> > +			 * base + reserved.
>> > +			 */
>> > +			total_bytes += remain_actual;
>> > +			need_wrap = remain_actual | 1;
>> 
>> Your remain_actual should never reach zero. So in here
>> forcing the lowest bit on, and later off, seems superfluous.
>
> Why can't we fill up to the last byte with commands? remain_actual is
> just (size - tail) and we don't force a wrap until emit crosses the
> boundary (and not before). We hit remain_actual == 0 in practice.
> -Chris

My mistake, was thinking postwrap.

num_dwords and second parameter to wait_for_space should be unsigned.

Reviewed-by: Mika Kuoppala <mika.kuoppala at intel.com>


More information about the Intel-gfx mailing list