[Intel-gfx] [PATCH v2 2/2] drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config

Daniel Vetter daniel at ffwll.ch
Thu Apr 28 14:48:37 UTC 2016


On Thu, Apr 28, 2016 at 11:38:55AM +0300, Imre Deak wrote:
> On to, 2016-04-28 at 10:17 +0200, Daniel Vetter wrote:
> > On Tue, Apr 26, 2016 at 07:01:06PM +0300, Imre Deak wrote:
> > > On ti, 2016-04-26 at 15:42 +0100, Chris Wilson wrote:
> > > > On Tue, Apr 26, 2016 at 05:26:43PM +0300, Eero Tamminen wrote:
> > > > > Hi,
> > > > > 
> > > > > On 26.04.2016 16:23, Chris Wilson wrote:
> > > > > > On Tue, Apr 26, 2016 at 04:17:55PM +0300, Imre Deak wrote:
> > > > > > > On ti, 2016-04-26 at 13:57 +0100, Chris Wilson wrote:
> > > > > > > > On Tue, Apr 26, 2016 at 03:44:22PM +0300, Imre Deak wrote:
> > > > > > > > > Setting a write-back cache policy in the MOCS entry
> > > > > > > > > definition also
> > > > > > > > > implies snooping, which has a considerable overhead. This
> > > > > > > > > is
> > > > > > > > > unexpected for a few reasons:
> > > > > > > > 
> > > > > > > > If it is snooping, then I don't see why it is undesirable to
> > > > > > > > have it
> > > > > > > > available in a mocs setting. If it is bogus and the bit is
> > > > > > > > undefined,
> > > > > > > > then by all means remove it.
> > > > > > > 
> > > > > > > None of these entries are used alone for coherent surfaces. For
> > > > > > > that
> > > > > > > the application would have to use entry index#1 or #2 _and_
> > > > > > > call the
> > > > > > > set caching IOCTL to set the corresponding buffer to be cached.
> > > > > > 
> > > > > > No, the application doesn't. There are sufficent interfaces
> > > > > > exposed that
> > > > > > userspace can bypass the kernel if it so desired.
> > > > > > 
> > > > > > > The
> > > > > > > problem is that without setting the buffer to be cacheable the
> > > > > > > expectation is that we won't be snooping and incur the
> > > > > > > corresponding
> > > > > > > overhead. This is what this patch addresses.
> > > > > > 
> > > > > > Not true.
> > > > > > 
> > > > > > > The bit is also bogus, if we wanted snooping via MOCS we'd use
> > > > > > > the
> > > > > > > dedicated HW flag for that.
> > > > > > 
> > > > > > But you keep saying this bit *enables* snooping. So either it
> > > > > > does or it
> > > > > > doesn't.
> > > > > > 
> > > > > > > If we wanted to have a snooping MOCS entry we should add that
> > > > > > > separately (as a forth entry), but we'd need this change as a
> > > > > > > fix for
> > > > > > > current users.
> > > > > > 
> > > > > > The current users who are getting what they request but don't
> > > > > > know what
> > > > > > they were requesting?
> > > > > 
> > > > > What this kernel ABI (index entry #2) has been agreed & documented
> > > > > to provide?
> > > > 
> > > > The ABI is what we agree makes sense between hardware / kernel /
> > > > userspace, and then we stick to it.
> > > > 
> > > > > I thought this entry is supposed to replace the writeback LLC/eLLC
> > > > > cache MOCS setting Mesa is using on (e.g. BDW) to speed up accesses
> > > > > to a memory area which it knows always to be accessed so that it
> > > > > can
> > > > > be cached.
> > > > 
> > > > Which it does... Only it speeds us writeback from the CPU, not the
> > > > GPU. ;)
> > > > 
> > > > The confusion seems to be in mistaking !llc for llc. We have to come
> > > > to
> > > > some agreement on whether it makes sense having multiple entries for
> > > > the
> > > > same follows-PTE mocs or whether it makes more sense to expose the hw
> > > > capabilties.
> > > 
> > > Note that we can't just say to Mesa to use index #0 on BXT instead of
> > > index #2, since index #0 turns off caching in GPU L3, so we'd have to
> > > also redefine that to be L3 cached. And I don't know what turning on L3
> > > caching for index #0 would break, I would rather avoid doing that. Imo
> > > defining the entries the following way would allow us to use the same
> > > indexes on GEN9 regardless of it being SKL or BXT:
> > > #0: uncached (neither in L3 nor in (e)LLC)
> > > #1: PTE passthrough
> > 
> > So our rendercpy in igt does set MOCS entry 1. Or how exactly do all the
> > set_caching vs. rendercpy tests we currently have pass? Just not?
> 
> We don't have tests for coherent surfaces. The current rendercpy test
> just uses uncached buffers, so they are flushed before checking the
> result. I could add a new subtest to rendercpy to test with a
> cached/coherent surface.
> 
> > Also, you're guaranateeing that opencl/libva don't screw this up either?
> 
> If they don't set the given buffer to be cached via the set_caching
> IOCTL (as a consequence making them coherent) they are already screwed
> on CHV. If they call the IOCTL they are fine on BXT too.

We do implicit set_caching when displaying something to something
coherent. To make that work userspace should use the "use PTE" mode by
default, except when they really know what they're doing. That's also the
mode that's supposed to give you the most reasonable performance. But
somehow that mode ended up in MOCS entry 1, so pretty much guaranteed
userspace will get it wrong. Mesa just hit a perf snag, but might as well
have been visual corruption. I think it'd be a lot safe to make "use PTE"
entry 0.
-Daniel

> > It's UABI, and we've botched it. Smells like we need to have a lot more
> > solid story and make sure we get this right this time around. There's also
> > the "how does dma-buf mmap support fit in" question.
> > 
> > Oh and nope, none of the relevant testcases are in BAT at all.
> > 
> > > #2: cached everywhere (L3 + (e)LLC if it exists), non-coherent
> > > #3: cached everywhere, coherent
> > > 
> > > I'm not sure if there is even a user for coherent surfaces atm, so then
> > > we could delay adding #3 until it's really needed.
> > 
> > The problem with the above is that the various access paths on SoC
> > (without LLC) and big core (with llc) are massively different. You need
> > lots of different cases in upload/download paths for max speed anyway.
> 
> The current assumption everywhere in user space is the above three
> entries with their definition above as far as I know. There are no
> users that would use a MOCS entry to get a coherent surface, we could
> add that as a 4th entry once a user arises.
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list