[Intel-gfx] [PATCH 0/5] drm/i915: Grab the vblank evasion lock around the entire evasion.

Chris Wilson chris at chris-wilson.co.uk
Tue Feb 13 10:40:43 UTC 2018


Quoting Maarten Lankhorst (2018-02-13 10:19:42)
> Op 12-02-18 om 18:01 schreef Ville Syrjälä:
> > On Fri, Feb 09, 2018 at 06:21:08PM +0100, Maarten Lankhorst wrote:
> >> Op 09-02-18 om 11:04 schreef Chris Wilson:
> >>> Quoting Maarten Lankhorst (2018-02-09 09:53:59)
> >>>> Some cleanups to move the uncore.lock around vblank evasion, so run
> >>>> to completion without racing on uncore.lock. Hopefully this will reduce
> >>>> the chance of underruns, and perhaps allows us to decrease 
> >>>> VBLANK_EVASION_TIME_US as well as a followup patch.
> >>>>
> >>>> Tested on KBL and BSW.
> >>> * shivers
> >>>
> >>> uncore.lock is a brutally contested lock. Ville's patches did work on
> >>> splitting the uncore.lock into forcewake and display variants, which
> >>> cuts down on the nasty side effects.
> >>>
> >>> Latency profiling, another item for the CI/QA wishlist.
> >>> -Chris
> >> Yeah, unfortunately this is not different from status quo. We already
> >> require everything inside vblank evasion to run as fast as possible,
> >> and it's down to a list of register writes and a few reads. Those
> >> already need the uncore.lock, so all we do now is being more explicit
> >> about when we take it and eliminate contention when we write out the
> >> register values.
> > Would be nice to have some results for this though. IIRC when I was
> > benchmarking my update optimizations and the de_lock stuff I was
> > simply logging how long the updates take, and staring at histograms
> > of that after running a bunch of igts and whatnot. I'm not sure I
> > have the results anymore, but IIRC I did see some improvement.
> >
> When testing with KBL and BSW, this patch series most updates complete in <40 us even with
> all debug options set, with the highest amount of time being a single update of 93 us for BSW.

To put that into perspective, that's a 4% delay in submission (ok, once
every 16ms so ~.25% amoritized). They start to notice and complain at
about a .5% drop in throughput, but fortunately they also don't tend to
run with outputs enabled.

That being said, if we can say this is capped to less than 50us, sure.
Although I reserve the right to complain later when we get response
targets less than 50us.
-Chris


More information about the Intel-gfx mailing list