[Intel-gfx] [PATCH] drm/i915: Write RING_TAIL once per-request

Tue Sep 10 16:13:53 CEST 2013

On Tue, Sep 10, 2013 at 02:01:20PM +0100, Chris Wilson wrote:
> On Mon, Aug 26, 2013 at 01:42:12PM -0700, Ben Widawsky wrote:
> > On Sat, Aug 10, 2013 at 10:16:32PM +0100, Chris Wilson wrote:
> > > Ignoring the legacy DRI1 code, and a couple of special cases (to be
> > > discussed later), all access to the ring is mediated through requests.
> > > The first write to a ring will grab a seqno and mark the ring as having
> > > an outstanding_lazy_request. Either through explicitly adding a request
> > > after an execbuffer or through an implicit wait (either by the CPU or by
> > > a semaphore), that sequence of writes will be terminated with a request.
> > > So we can ellide all the intervening writes to the tail register and
> > > send the entire command stream to the GPU at once. This will reduce the
> > > number of *serialising* writes to the tail register by a factor or 3-5
> > > times (depending upon architecture and number of workarounds, context
> > > switches, etc involved). This becomes even more noticeable when the
> > > register write is overloaded with a number of debugging tools. The
> > > astute reader will wonder if it is then possible to overflow the ring
> > > with a single command. It is not. When we start a command sequence to
> > > the ring, we check for available space and issue a wait in case we have
> > > not. The ring wait will in this case be forced to flush the outstanding
> > > register write and then poll the ACTHD for sufficient space to continue.
> > > 
> > > The exception to the rule where everything is inside a request are a few
> > > initialisation cases where we may want to write GPU commands via the CS
> > > before userspace wakes up and page flips.
> > > 
> > 
> > I'm not a huge fan of having the second intel_ring_advance() that does something
> > other than it sounds like. I'd *much* prefer to not intel_ring_advance()
> > if you are certain more emits will be coming like in the case you
> > mention above. We can add a paranoia check whenever we're about to
> > return to userspace that tail == RING_TAIL
> > 
> > Also, without performance data, it's hard to say this indirection is
> > worth it.
> 
> Just a sample, UXA on i5-2520qm, aa10text:
> 2580000.0/sec -> 2980000.0/sec

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch