[Intel-gfx] [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
Chris Wilson
chris at chris-wilson.co.uk
Wed Jun 28 10:19:21 UTC 2017
Quoting Francisco Jerez (2017-05-04 21:59:44)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
> >> David Weinehall <david.weinehall at linux.intel.com> writes:
> >>
> >> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
> >> >> A good default for garbage entries from the user is to follow the
> >> >> default setting of the object (i.e. the PTE). Currently they use the
> >> >> uncached entry, and now the only way to accidentally hit uncached
> >> >> performance is via explicit use of the uncached MOCS or setting the
> >> >> object to uncached. Note that these entries are currently undefined in
> >> >> the ABI and we reserve the right to change them. We originally chose
> >> >> uncached to eliminate any problem with reducing the caching level in
> >> >> future, but the object is a much better definition of the minimum
> >> >> caching level.
> >> >>
> >>
> >> NAK. The reason for the default being UC is that it's the only setting
> >> that guarantees full forwards compatibility with any other entry that
> >> might be added in the future. If you default to PTE on (e)LLC and WB on
> >> L3, userspace will no longer be able to use any newly introduced entry
> >> with stricter coherency guarantees than that (e.g. any L3-uncached
> >> entry) in a backwards-compatible way. Attempting to do so may break
> >> memory coherency assumptions of the application and lead to misrendering
> >> when run on older kernel versions (which to my judgment is a scarier
> >> failure mode than reduced performance).
> >
> > You can't use a weaker coherency model in mocs than that specified for
> > the object as you can't control other uses of the object (even just
> > memory pressure will break your assumptions).
>
> Exactly, but you can use a stronger coherency model than the application
> requested, which is why falling back to UC should generally work for
> unknown entries but falling back to PTE+WB isn't guaranteed to.
Still wrong. GEM will write into the CPU cache believing the object is
coherent. The GPU will read from memory bypassing the CPU cache
following the UC mocs. The only safe option is for it to follow PTE.
-Chris
More information about the Intel-gfx
mailing list