[Intel-gfx] [PATCH v2 2/2] drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config

Mon May 2 08:28:50 UTC 2016

On Thu, Apr 28, 2016 at 08:15:24PM +0300, Imre Deak wrote:
> On to, 2016-04-28 at 16:48 +0200, Daniel Vetter wrote:
> > On Thu, Apr 28, 2016 at 11:38:55AM +0300, Imre Deak wrote:
> > > On to, 2016-04-28 at 10:17 +0200, Daniel Vetter wrote:
> > > > Also, you're guaranateeing that opencl/libva don't screw this up
> > > > either?
> > > 
> > > If they don't set the given buffer to be cached via the set_caching
> > > IOCTL (as a consequence making them coherent) they are already
> > > screwed
> > > on CHV. If they call the IOCTL they are fine on BXT too.
> > 
> > We do implicit set_caching when displaying something to something
> > coherent. To make that work userspace should use the "use PTE" mode by
> > default, except when they really know what they're doing.
> > That's also the mode that's supposed to give you the most reasonable
> > performance. But somehow that mode ended up in MOCS entry 1, so pretty much
> > guaranteed userspace will get it wrong. Mesa just hit a perf snag, but
> > might as well have been visual corruption. I think it'd be a lot safe to
> > make "use PTE" entry 0.
> 
> Mesa uses entries 1 and 2. If something else like opencl or libva (or
> even Mesa for that matter) uses index 0 for PTE pass-through that's a
> bug on its own. I don't know if this is the case, but it's a separate
> issue from what I'm trying to fix here.
> 
> This isn't about a case where a PTE pass-through entry needs to be
> provided, but about the case where a cached but non-coherent one is
> needed. Mesa assumes this to be entry 2 and I don't see why we couldn't
> make sure that this assumption holds. Note that this entry on BXT could
> be both a PTE pass-through one as in this patch or LLC-UC.

Yeah my comment about entry #0 was is a different track of discussion.
Should still fix it up while we clarify what entries 1&2 really mean.

When defining entries as "cached" please make triple sure what exactly you
mean by that. Since eLLC, LLC and on-gpu L3$ are all different caches, in
different parts of the coherency fabric. And L3$ has functional relevance
since if that's not set compute features fall apart.

So maybe a better definition would be "L3$ cached (useful for
compute)+performance optimized otherwise for general use+might be
incoherent depending upon platform" for entry 2. That would make sense and
covers it all, but imo yours a bit too simple (assuming my understanding
of cache architecture on gen9 is accurate, they change it all the bloody
time). Note that e.g. on older platforms we could enable L3$ either in the
PTE, or in MOCS settings in the batch itself, which is why the "useful for
compute" only started to become relevant for gen9. And why we needed the
kernel MOCS patch really.

Simililarly we'd need to come up with something accurate for #1 and #0
(and it sounds like they both need to mean PTE default per kernel). Hooray
for wasting one entry ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch