[Intel-gfx] [PATCH v2 2/2] drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config

Thu Apr 28 10:48:27 UTC 2016

On Thu, Apr 28, 2016 at 10:13:37AM +0200, Daniel Vetter wrote:
> On Tue, Apr 26, 2016 at 08:57:51PM +0300, Ville Syrjälä wrote:
> > On Tue, Apr 26, 2016 at 04:30:05PM +0200, Daniel Vetter wrote:
> > > On Tue, Apr 26, 2016 at 05:26:43PM +0300, Eero Tamminen wrote:
> > > > Hi,
> > > > 
> > > > On 26.04.2016 16:23, Chris Wilson wrote:
> > > > >On Tue, Apr 26, 2016 at 04:17:55PM +0300, Imre Deak wrote:
> > > > >>On ti, 2016-04-26 at 13:57 +0100, Chris Wilson wrote:
> > > > >>>On Tue, Apr 26, 2016 at 03:44:22PM +0300, Imre Deak wrote:
> > > > >>>>Setting a write-back cache policy in the MOCS entry definition also
> > > > >>>>implies snooping, which has a considerable overhead. This is
> > > > >>>>unexpected for a few reasons:
> > > > >>>
> > > > >>>If it is snooping, then I don't see why it is undesirable to have it
> > > > >>>available in a mocs setting. If it is bogus and the bit is undefined,
> > > > >>>then by all means remove it.
> > > > >>
> > > > >>None of these entries are used alone for coherent surfaces. For that
> > > > >>the application would have to use entry index#1 or #2 _and_ call the
> > > > >>set caching IOCTL to set the corresponding buffer to be cached.
> > > > >
> > > > >No, the application doesn't. There are sufficent interfaces exposed that
> > > > >userspace can bypass the kernel if it so desired.
> > > > >
> > > > >>The
> > > > >>problem is that without setting the buffer to be cacheable the
> > > > >>expectation is that we won't be snooping and incur the corresponding
> > > > >>overhead. This is what this patch addresses.
> > > > >
> > > > >Not true.
> > > > >
> > > > >>The bit is also bogus, if we wanted snooping via MOCS we'd use the
> > > > >>dedicated HW flag for that.
> > > > >
> > > > >But you keep saying this bit *enables* snooping. So either it does or it
> > > > >doesn't.
> > > > >
> > > > >>If we wanted to have a snooping MOCS entry we should add that
> > > > >>separately (as a forth entry), but we'd need this change as a fix for
> > > > >>current users.
> > > > >
> > > > >The current users who are getting what they request but don't know what
> > > > >they were requesting?
> > > > 
> > > > What this kernel ABI (index entry #2) has been agreed & documented to
> > > > provide?
> > > > 
> > > > I thought this entry is supposed to replace the writeback LLC/eLLC cache
> > > > MOCS setting Mesa is using on (e.g. BDW) to speed up accesses to a memory
> > > > area which it knows always to be accessed so that it can be cached.
> > > > 
> > > > If app runs on HW where LLC/eLLC is missing, giving the app extra slowdown
> > > > instead of potential speedup sounds like failed HW abstraction. :-)
> > > 
> > > Well mesa needs to know llc vs. !llc anyway to not totally suck, and
> > > defining entry #2 as "coherent, always" makes sense. I thought entry 0 was
> > > the reaonable default aka pte passthrough and hence managed by kernel?
> > 
> > Nope, we fscked that up somewhat, and entry 1 is the PTE one :( So if
> > userspace forgets to set MOCS on gen9 it won't get the behaviour it
> > would have gotten on previous gens.
> 
> How do we manage to pass the various set_caching vs. rendercpy test then
> on skl? Or do we just not, and everyone is still lalala?

Why wouldn't it pass? It's all coherent anyway. Although apparently the
docs have some confusing comments that things wouldn't be coherent unless
you enable LLC caching, but maybe that's just about coherent L3 or
something?

> 
> Sounds like something to fix either way.
> 
> > > If mesa asks for nonsense, the kernel is happy to oblige.
> > 
> > We never really defined what entry 2 actually means: coherent or sane
> > performance. Mesa has, rightfully IMO, made the assumption that it means
> > the latter since we never set out to define any MOCS entries with
> > coherency in mind. And seeing that it's already out in the wild, I think
> > it's better to respect it. If we change it now we would just make it
> > more painful for people when they get their hands on the hardware.
> > 
> > I think what we should do is define what the MOCS indexes mean in some
> > uapi header. Then there would be no ambiguity.
> 
> Problem with encoding this is that sooner or later we're playing a game of
> whack-a-mole where kernel second-guesses mesa to support old versions, and
> mesa tries to work around kernel assumptions that no longer fit reworked
> code. "It's the easiest solution for existing binaries" is imo very poor
> justification for ABI, especially when it's just about performance.
> Fundamentally mesa still works correctly, just a bit slow.
> 
> I don't want to merge a random hack just because.

I don't see it as any kind of hack. That's the ABI we agreed on pretty
much in my opinion. No one had any concerns about coherency when the
current MOCS entries were allocated IIRC.

> And it sounds like
> someone needs to fix up the MOCS story on gen9 overall anyway.
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Ville Syrjälä
Intel OTC