[Intel-gfx] [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
Michael Cheng
michael.cheng at intel.com
Mon Mar 7 16:52:06 UTC 2022
Ah Thanks for the great feedback!
@Lucas or @Matt, could you please chime in?
Michael Cheng
On 2022-03-02 11:10 a.m., Robin Murphy wrote:
> On 2022-03-02 15:55, Michael Cheng wrote:
>> Thanks for the feedback Robin!
>>
>> Sorry my choices of word weren't that great, but what I meant is to
>> understand how ARM flushes a range of dcache for device drivers, and
>> not an equal to x86 clflush.
>>
>> I believe the concern is if the CPU writes an update, that update
>> might only be sitting in the CPU cache and never make it to device
>> memory where the device can see it; there are specific places that we
>> are supposed to flush the CPU caches to make sure our updates are
>> visible to the hardware.
>
> Ah, OK, if it's more about ordering, and it's actually write buffers
> rather than caches that you care about flushing, then we might be a
> lot safer, phew!
>
> For a very simple overview, in a case where the device itself needs to
> observe memory writes in the correct order, e.g.:
>
> data_descriptor.valid = 1;
>
> clflush(&data_descriptor);
>
> command_descriptor.data = &data_descriptor
>
> writel(/* control register to read command to then read data */)
>
> then dma_wmb() between the first two writes should be the right tool
> to ensure that the command does not observe the command update while
> the data update is still sat somewhere in a CPU write buffer.
>
> If you want a slightly stronger notion that, at a given point, all
> prior writes have actually been issued and should now be visible
> (rather than just that they won't become visible in the wrong order
> whenever they do), then wmb() should suffice on arm64.
>
> Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM
> for a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can
> still be write-buffered, so barriers still matter even when actual
> cache maintenance ops don't (and as before if you're trying to perform
> cache maintenance outside the DMA API then you've already lost
> anyway). MMIO registers should be mapped as Device memory via
> ioremap(), which is not bufferable, hence the barrier implicit in
> writel() effectively pushes out any prior buffered writes ahead of a
> register write, which is why we don't need to worry about this most of
> the time.
>
> This is only a very rough overview, though, and I'm not familiar
> enough with x86 semantics, your hardware, or the exact use-case to be
> able to say whether barriers alone are anywhere near the right answer
> or not.
>
> Robin.
>
>>
>> +Matt Roper
>>
>> Matt, Lucas, any feed back here?
>>
>> On 2022-03-02 4:49 a.m., Robin Murphy wrote:
>>> On 2022-02-25 19:27, Michael Cheng wrote:
>>>> Hi Robin,
>>>>
>>>> [ +arm64 maintainers for their awareness, which would have been a
>>>> good thing to do from the start ]
>>>>
>>>> * Thanks for adding the arm64 maintainer and sorry I didn't rope
>>>> them
>>>> in sooner.
>>>>
>>>> Why does i915 need to ensure the CPU's instruction cache is
>>>> coherent with its data cache? Is it a self-modifying driver?
>>>>
>>>> * Also thanks for pointing this out. Initially I was using
>>>> dcache_clean_inval_poc, which seem to be the equivalently to what
>>>> x86 is doing for dcache flushing, but it was giving me build
>>>> errors
>>>> since its not on the global list of kernel symbols. And after
>>>> revisiting the documentation for caches_clean_inval_pou, it won't
>>>> fly for what we are trying to do. Moving forward, what would
>>>> you (or
>>>> someone in the ARM community) suggest we do? Could it be
>>>> possible to
>>>> export dcache_clean_inval_poc as a global symbol?
>>>
>>> Unlikely, unless something with a legitimate need for CPU-centric
>>> cache maintenance like kexec or CPU hotplug ever becomes modular.
>>>
>>> In the case of a device driver, it's not even the basic issues of
>>> assuming to find direct equivalents to x86 semantics in other CPU
>>> architectures, or effectively reinventing parts of the DMA API, it's
>>> even bigger than that. Once you move from being integrated in a
>>> single vendor's system architecture to being on a discrete card, you
>>> fundamentally *no longer have any control over cache coherency*.
>>> Whether the host CPU architecture happens to be AArch64, RISC-V, or
>>> whatever doesn't really matter, you're at the mercy of 3rd-party
>>> PCIe and interconnect IP vendors, and SoC integrators. You'll find
>>> yourself in systems where PCIe simply cannot snoop any caches, where
>>> you'd better have the correct DMA API calls in place to have any
>>> hope of even the most basic functionality working properly; you'll
>>> find yourself in systems where even if the PCIe root complex claims
>>> to support No Snoop, your uncached traffic will still end up
>>> snooping stale data that got prefetched back into caches you thought
>>> you'd invalidated; you'll find yourself in systems where your memory
>>> attributes may or may not get forcibly rewritten by an IOMMU
>>> depending on the kernel config and/or command line.
>>>
>>> It's not about simply finding a substitute for clflush, it's that
>>> the reasons you have for using clflush in the first place can no
>>> longer be assumed to be valid.
>>>
>>> Robin.
>>>
>>>> On 2022-02-25 10:24 a.m., Robin Murphy wrote:
>>>>> [ +arm64 maintainers for their awareness, which would have been a
>>>>> good thing to do from the start ]
>>>>>
>>>>> On 2022-02-25 03:24, Michael Cheng wrote:
>>>>>> Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
>>>>>> performs a flush by first performing a clean, follow by an
>>>>>> invalidation
>>>>>> operation.
>>>>>>
>>>>>> v2 (Michael Cheng): Use correct macro for cleaning and
>>>>>> invalidation the
>>>>>> dcache. Thanks Tvrtko for the suggestion.
>>>>>>
>>>>>> v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h
>>>>>>
>>>>>> v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc
>>>>>> as a
>>>>>> symbol that could be use by other modules, thus use
>>>>>> caches_clean_inval_pou instead. Also this version
>>>>>> removes include for cacheflush, since its already
>>>>>> included base on architecture type.
>>>>>>
>>>>>> Signed-off-by: Michael Cheng <michael.cheng at intel.com>
>>>>>> Reviewed-by: Matt Roper <matthew.d.roper at intel.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/drm_cache.c | 5 +++++
>>>>>> 1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_cache.c
>>>>>> b/drivers/gpu/drm/drm_cache.c
>>>>>> index c3e6e615bf09..81c28714f930 100644
>>>>>> --- a/drivers/gpu/drm/drm_cache.c
>>>>>> +++ b/drivers/gpu/drm/drm_cache.c
>>>>>> @@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned
>>>>>> long length)
>>>>>> if (wbinvd_on_all_cpus())
>>>>>> pr_err("Timed out waiting for cache flush\n");
>>>>>> +
>>>>>> +#elif defined(CONFIG_ARM64)
>>>>>> + void *end = addr + length;
>>>>>> + caches_clean_inval_pou((unsigned long)addr, (unsigned
>>>>>> long)end);
>>>>>
>>>>> Why does i915 need to ensure the CPU's instruction cache is
>>>>> coherent with its data cache? Is it a self-modifying driver?
>>>>>
>>>>> Robin.
>>>>>
>>>>> (Note that the above is somewhat of a loaded question, and I do
>>>>> actually have half an idea of what you're trying to do here and
>>>>> why it won't fly, but I'd like to at least assume you've read the
>>>>> documentation of the function you decided was OK to use)
>>>>>
>>>>>> +
>>>>>> #else
>>>>>> WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
>>>>>> #endif
More information about the Intel-gfx
mailing list