[Intel-gfx] [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE

Wed Jun 28 21:10:40 UTC 2017

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2017-05-04 21:59:44)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
>> >> David Weinehall <david.weinehall at linux.intel.com> writes:
>> >> 
>> >> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
>> >> >> A good default for garbage entries from the user is to follow the
>> >> >> default setting of the object (i.e. the PTE). Currently they use the
>> >> >> uncached entry, and now the only way to accidentally hit uncached
>> >> >> performance is via explicit use of the uncached MOCS or setting the
>> >> >> object to uncached. Note that these entries are currently undefined in
>> >> >> the ABI and we reserve the right to change them. We originally chose
>> >> >> uncached to eliminate any problem with reducing the caching level in
>> >> >> future, but the object is a much better definition of the minimum
>> >> >> caching level.
>> >> >> 
>> >> 
>> >> NAK.  The reason for the default being UC is that it's the only setting
>> >> that guarantees full forwards compatibility with any other entry that
>> >> might be added in the future.  If you default to PTE on (e)LLC and WB on
>> >> L3, userspace will no longer be able to use any newly introduced entry
>> >> with stricter coherency guarantees than that (e.g. any L3-uncached
>> >> entry) in a backwards-compatible way.  Attempting to do so may break
>> >> memory coherency assumptions of the application and lead to misrendering
>> >> when run on older kernel versions (which to my judgment is a scarier
>> >> failure mode than reduced performance).
>> >
>> > You can't use a weaker coherency model in mocs than that specified for
>> > the object as you can't control other uses of the object (even just
>> > memory pressure will break your assumptions).
>> 
>> Exactly, but you can use a stronger coherency model than the application
>> requested, which is why falling back to UC should generally work for
>> unknown entries but falling back to PTE+WB isn't guaranteed to.
>
> Still wrong. GEM will write into the CPU cache believing the object is
> coherent. The GPU will read from memory bypassing the CPU cache
> following the UC mocs.

I agree that this is a plausible scenario.

> The only safe option is for it to follow PTE.

Except you don't know whether the client reading or writing at the other
end is the CPU, or whether the client at the other end is (set up to be)
LLC-coherent.  There's likely no 100% safe option on the LLC side of
things.

I could probably be convinced that in a number of scenarios PTE on LLC
has somewhat better chances of success, but on the L3 side of things
this patch enables WB which is AFAIA strictly more weakly coherent than
UC, so it still gets my NAK.

> -Chris

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20170628/006d397e/attachment.sig>