[Intel-gfx] [PATCH v3] drm/i915: Optimistically spin for the request completion

Fri Mar 20 08:27:24 PDT 2015

On Fri, Mar 20, 2015 at 03:54:01PM +0100, Daniel Vetter wrote:
> On Thu, Mar 19, 2015 at 03:16:15PM +0000, Chris Wilson wrote:
> > On Thu, Mar 12, 2015 at 11:11:17AM +0000, Chris Wilson wrote:
> > > This provides a nice boost to mesa in swap bound scenarios (as mesa
> > > throttles itself to the previous frame and given the scenario that will
> > > complete shortly). It will also provide a good boost to systems running
> > > with semaphores disabled and so frequently waiting on the GPU as it
> > > switches rings. In the most favourable of microbenchmarks, this can
> > > increase performance by around 15% - though in practice improvements
> > > will be marginal and rarely noticeable.
> > > 
> > > v2: Account for user timeouts
> > > v3: Limit the spinning to a single jiffie (~1us) at most. On an
> > > otherwise idle system, there is no scheduler contention and so without a
> > > limit we would spin until the GPU is ready.
> > > 
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > 
> > Just recording ideas for the future. Replace the busy-spin with
> > monitor/mwait. This requires Pentium4+, a cooperating GPU with working
> > cacheline snooping and that we use HWS seqno.
> 
> Just for the record: Did it help with powersaving or was it all in the
> noise?

Unscientifically, I would say mwait(cstate=0) was worse. It gave a
marginally higher peak, but there was clearly worse thermal throttling
than the simple busy-wait. powertop suggests that with the mwait we were
not reaching as low a package cstate as often.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre