[Intel-gfx] [PATCH] drm/i915: Write RING_TAIL once per-request

Chris Wilson chris at chris-wilson.co.uk
Tue Sep 10 15:01:20 CEST 2013


On Mon, Aug 26, 2013 at 01:42:12PM -0700, Ben Widawsky wrote:
> On Sat, Aug 10, 2013 at 10:16:32PM +0100, Chris Wilson wrote:
> > Ignoring the legacy DRI1 code, and a couple of special cases (to be
> > discussed later), all access to the ring is mediated through requests.
> > The first write to a ring will grab a seqno and mark the ring as having
> > an outstanding_lazy_request. Either through explicitly adding a request
> > after an execbuffer or through an implicit wait (either by the CPU or by
> > a semaphore), that sequence of writes will be terminated with a request.
> > So we can ellide all the intervening writes to the tail register and
> > send the entire command stream to the GPU at once. This will reduce the
> > number of *serialising* writes to the tail register by a factor or 3-5
> > times (depending upon architecture and number of workarounds, context
> > switches, etc involved). This becomes even more noticeable when the
> > register write is overloaded with a number of debugging tools. The
> > astute reader will wonder if it is then possible to overflow the ring
> > with a single command. It is not. When we start a command sequence to
> > the ring, we check for available space and issue a wait in case we have
> > not. The ring wait will in this case be forced to flush the outstanding
> > register write and then poll the ACTHD for sufficient space to continue.
> > 
> > The exception to the rule where everything is inside a request are a few
> > initialisation cases where we may want to write GPU commands via the CS
> > before userspace wakes up and page flips.
> > 
> 
> I'm not a huge fan of having the second intel_ring_advance() that does something
> other than it sounds like. I'd *much* prefer to not intel_ring_advance()
> if you are certain more emits will be coming like in the case you
> mention above. We can add a paranoia check whenever we're about to
> return to userspace that tail == RING_TAIL
> 
> Also, without performance data, it's hard to say this indirection is
> worth it.

Just a sample, UXA on i5-2520qm, aa10text:
2580000.0/sec -> 2980000.0/sec
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list