[Intel-gfx] Write GFX_FLSH_CNT after updating GGTT entries
Tian, Kevin
kevin.tian at intel.com
Fri Nov 20 01:53:20 PST 2015
> From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> Sent: Friday, November 20, 2015 5:40 PM
>
> On Fri, Nov 20, 2015 at 09:23:12AM +0000, Tian, Kevin wrote:
> > > From: Ville Syrjälä [mailto:ville.syrjala at linux.intel.com]
> > > Sent: Thursday, November 19, 2015 6:35 PM
> > >
> > > On Thu, Nov 19, 2015 at 06:20:23PM +0800, Zhi Wang wrote:
> > > > Hi Gurus:
> > > > I'm curious about the register GFX_FLSH_CNT(0x101008) in
> > > > i915_gem_gtt.c. Does these register exist in recently generations? After
> > > > digging into b-spec, it looks only BXT and CHV has this register. Does
> > > > the desktop platform also have this register which needs to be written
> > > > after updating GGTT MMIOs?
> > > >
> > > > BTW: Looks windows driver haven't used this MMIO... So whose behavior is
> > > > the right behavior?
> > >
> > > As I understand it that register flushes the CPU GTT TLBs, and we need
> > > to do it because of the WC mapping we have for the GTT PTEs. If we used
> > > UC mapping we wouldn't need it since there's supposedly an automagic
> > > TLB flush that happens on PTE writes.
> > >
> > > BSpec is bad at finding some registers via bxml. Using dtsearch and
> > > looking for both 0x<offset> and <offset>h is the method I use to track
> > > such things down.
> > >
> >
> > Curious how much gain is observed by using WC vs. using UC on GTT
> > entries?
>
> Think back yonder when everything goes through the GGTT, and where
> writing the PTEs was slower than allocating a bunch of pages and
> applications would insist on submitting new objects every batch.
>
> It was very easy to have workloads where UC GGTT updates were the
> ratelimiting step. A WC update is ~8x faster, and sufficient to move the
> bottleneck elsewhere.
> -Chris
8x is clearly worthy of it. Today w/ PPGTT this optimization is
somehow less important then. :-)
Thanks
Kevin
More information about the Intel-gfx
mailing list