[Intel-gfx] [PATCH v2 2/2] drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config

Mon May 2 11:18:13 UTC 2016

On Mon, May 02, 2016 at 10:28:50AM +0200, Daniel Vetter wrote:
> On Thu, Apr 28, 2016 at 08:15:24PM +0300, Imre Deak wrote:
> > On to, 2016-04-28 at 16:48 +0200, Daniel Vetter wrote:
> > > On Thu, Apr 28, 2016 at 11:38:55AM +0300, Imre Deak wrote:
> > > > On to, 2016-04-28 at 10:17 +0200, Daniel Vetter wrote:
> > > > > Also, you're guaranateeing that opencl/libva don't screw this up
> > > > > either?
> > > > 
> > > > If they don't set the given buffer to be cached via the set_caching
> > > > IOCTL (as a consequence making them coherent) they are already
> > > > screwed
> > > > on CHV. If they call the IOCTL they are fine on BXT too.
> > > 
> > > We do implicit set_caching when displaying something to something
> > > coherent. To make that work userspace should use the "use PTE" mode by
> > > default, except when they really know what they're doing.
> > > That's also the mode that's supposed to give you the most reasonable
> > > performance. But somehow that mode ended up in MOCS entry 1, so pretty much
> > > guaranteed userspace will get it wrong. Mesa just hit a perf snag, but
> > > might as well have been visual corruption. I think it'd be a lot safe to
> > > make "use PTE" entry 0.
> > 
> > Mesa uses entries 1 and 2. If something else like opencl or libva (or
> > even Mesa for that matter) uses index 0 for PTE pass-through that's a
> > bug on its own. I don't know if this is the case, but it's a separate
> > issue from what I'm trying to fix here.
> > 
> > This isn't about a case where a PTE pass-through entry needs to be
> > provided, but about the case where a cached but non-coherent one is
> > needed. Mesa assumes this to be entry 2 and I don't see why we couldn't
> > make sure that this assumption holds. Note that this entry on BXT could
> > be both a PTE pass-through one as in this patch or LLC-UC.
> 
> Yeah my comment about entry #0 was is a different track of discussion.
> Should still fix it up while we clarify what entries 1&2 really mean.
> 
> When defining entries as "cached" please make triple sure what exactly you
> mean by that. Since eLLC, LLC and on-gpu L3$ are all different caches, in
> different parts of the coherency fabric. And L3$ has functional relevance
> since if that's not set compute features fall apart.
> 
> So maybe a better definition would be "L3$ cached (useful for
> compute)+performance optimized otherwise for general use+might be
> incoherent depending upon platform" for entry 2. That would make sense and
> covers it all, but imo yours a bit too simple (assuming my understanding
> of cache architecture on gen9 is accurate, they change it all the bloody
> time). Note that e.g. on older platforms we could enable L3$ either in the
> PTE, or in MOCS settings in the batch itself, which is why the "useful for
> compute" only started to become relevant for gen9. And why we needed the
> kernel MOCS patch really.

L3 control via PTE died with IVB. Since then it's been MOCS only.

-- 
Ville Syrjälä
Intel OTC