[RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

Mon Jan 21 18:20:02 UTC 2019

On Mon, 21 Jan 2019 at 19:04, Michel Dänzer <michel at daenzer.net> wrote:
>
> On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 18:55, Michel Dänzer <michel at daenzer.net> wrote:
> >>
> >> On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
> >>> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig <hch at infradead.org> wrote:
> >>>
> >>>> Until that happens we should just change the driver ifdefs to default
> >>>> the hacks to off and only enable them on setups where we 100%
> >>>> positively know that they actually work.  And document that fact
> >>>> in big fat comments.
> >>>
> >>> Well, as I mentioned in my commit log as well, if we default to off
> >>> unless CONFIG_X86, we may break working setups on MIPS and Power where
> >>> the device is in fact non-cache coherent, and relies on this
> >>> 'optimization' to get things working.
> >>
> >> FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
> >> correct basic operation (the scenario Christian brought up is a very
> >> specialized use-case), so that shouldn't be an issue.
> >>
> >
> > The point is that this is only true for x86.
> >
> > On other architectures, the use of non-cached mappings on the CPU side
> > means that you /do/ rely on non-snooped transfers, since if those
> > transfers turn out not to snoop inadvertently, the accesses are
> > incoherent with the CPU's view of memory.
>
> The driver generally only uses non-cached mappings if
> drm_arch/device_can_wc_memory returns true.
>

Indeed. And so we should take care to only return 'true' from that
function if it is guaranteed that non-cached CPU mappings are coherent
with the mappings used by the GPU, either because that is always the
case (like on x86), or because we know that the platform in question
implements NoSnoop correctly throughout the interconnect.

What seems to be complicating matters is that in some cases, the
device is non-cache coherent to begin with, so regardless of whether
the NoSnoop attribute is used or not, those accesses will not snoop in
the caches and be coherent with the non-cached mappings used by the
CPU. So if we restrict this optimization [on non-X86] to platforms
that are known to implement NoSnoop correctly, we may break platforms
that are implicitly NoSnoop all the time.