[RFC PATCH] drm/ttm: force cached mappings for system RAM on ARM

Will Deacon will.deacon at arm.com
Mon Jan 14 19:13:50 UTC 2019


On Mon, Jan 14, 2019 at 07:07:54PM +0000, Koenig, Christian wrote:
> Am 14.01.19 um 18:32 schrieb Ard Biesheuvel:
>             - The reason remapping the CPU side as cacheable does work (which I
>             did test) is because the GPU's uncacheable accesses (which I assume
>             are made using the NoSnoop PCIe transaction attribute) are actually
>             emitted as cacheable in some cases.
>                . On my AMD Seattle, with or without SMMU (which is stage 2 only), I
>             must use cacheable accesses from the CPU side or things are broken.
>             This might be a h/w flaw, though.
>                . On systems with stage 1+2 SMMUs, the driver uses stage 1
>             translations which always override the memory attributes to cacheable
>             for DMA coherent devices. This is what is affecting the Cavium
>             ThunderX2 (although it appears the attributes emitted by the RC may be
>             incorrect as well.)
> 
>             The latter issue is a shortcoming in the SMMU driver that we have to
>             fix, i.e., it should take care not to modify the incoming attributes
>             of DMA coherent PCIe devices for NoSnoop to be able to work.
> 
>             So in summary, the mismatch appears to be between the CPU accessing
>             the vmap region with non-cacheable attributes and the GPU accessing
>             the same memory with cacheable attributes, resulting in a loss of
>             coherency and lots of visible corruption.
> 
>         Actually it is the other way around. The CPU thinks some data is in the
>         cache and the GPU only updates the system memory version because the
>         snoop flag is not set.
> 
> 
>     That doesn't seem to be what is happening. As far as we can tell from
>     our experiments, all inbound transactions are always cacheable, and so
>     the only way to make things work is to ensure that the CPU uses the
>     same attributes.
> 
> 
> Ok that doesn't make any sense. If inbound transactions are cacheable or not is
> irrelevant when the CPU always uses uncached accesses.
> 
> See on the PCIe side you have the snoop bit in the read/write transactions
> which tells the root hub if the device wants to snoop caches or not.
> 
> When the CPU accesses some memory as cached then devices need to snoop the
> cache for coherent accesses.
> 
> When the CPU accesses some memory as uncached then devices can disable snooping
> to improve performance, but when they don't do this it is mandated by the spec
> that this still works.

Which spec? The Arm architecture (and others including Power afaiu) doesn't
guarantee coherency when memory is accessed using mismatched cacheability
attributes.

Will


More information about the dri-devel mailing list