[PATCH] drm: gem: add an option for supporting the dma-coherent hardware.

Sui Jingfeng suijingfeng at loongson.cn
Thu Jun 8 03:03:29 UTC 2023


Hi,

On 2023/6/7 20:19, Maxime Ripard wrote:
> On Wed, Jun 07, 2023 at 06:30:01PM +0800, Sui Jingfeng wrote:
>> On 2023/6/7 17:36, Paul Cercueil wrote:
>>> Hi Sui,
>>>
>>> Le mercredi 07 juin 2023 à 13:30 +0800, Sui Jingfeng a écrit :
>>>> The single map_noncoherent member of struct drm_gem_dma_object may
>>>> not
>>>> sufficient for describing the backing memory of the GEM buffer
>>>> object.
>>>>
>>>> Especially on dma-coherent systems, the backing memory is both cached
>>>> coherent for multi-core CPUs and dma-coherent for peripheral device.
>>>> Say architectures like X86-64, LoongArch64, Loongson Mips64, etc.
>>>>
>>>> Whether a peripheral device is dma-coherent or not can be
>>>> implementation-dependent. The single map_noncoherent option is not
>>>> enough
>>>> to reflect real hardware anymore. For example, the Loongson LS3A4000
>>>> CPU
>>>> and LS2K2000/LS2K1000 SoC, peripheral device of such hardware
>>>> platform
>>>> allways snoop CPU's cache. Doing the allocation with
>>>> dma_alloc_coherent
>>>> function is preferred. The return buffer is cached, it should not
>>>> using
>>>> the default write-combine mapping. While with the current implement,
>>>> there
>>>> no way to tell the drm core to reflect this.
>>>>
>>>> This patch adds cached and coherent members to struct
>>>> drm_gem_dma_object.
>>>> which allow driver implements to inform the core. Introducing new
>>>> mappings
>>>> while keeping the original default behavior unchanged.
>>> Did you try to simply set the "dma-coherent" property to the device's
>>> node?
>> But this approach can only be applied for the device driver with DT support.
>>
>> X86-64, Loongson ls3a4000 mips64, Loongson ls3a5000 CPU typically do not
>> have DT support.
>>
>> They using ACPI to pass parameter from the firmware to Linux kernel.
>>
>> You approach will lost the effectiveness on such a case.
> Not really, no. All DT support is doing is setting some generic device
> parameter based on that property, but the infrastructure is very much
> generic and can be used on systems without a DT.
>
>>>   From what I understand if you add that property then Linux will use DMA
>>> coherent memory even though you use dma_alloc_noncoherent() and the
>>> sync_single_for_cpu() / sync_single_for_device() are then NOPs.
>   >
>> Please do not mitigate the problems with confusing method.
> It's not a confusing method, it's one of the two main API to deal with
> DMA buffers. And you might disagree with Paul but there's no need to be
> rude about it.
>
>> This approach not only tend to generate confusion but also
>> implement-dependent and arch-dependent. It's definitely problematic.
>>
>>
>> How does the dma_alloc_coherent/dma_alloc_noncoherent is a ARCH specific
>> thing.
>>
>> Dependent on how does the arch_dma_ops is implemented.
>>
>>
>> The definition of the coherent on different ARCH has different meanings.
>>
>> The definition of the wirte-combine on different ARCH has different
>> meanings.
>>
>>
>> The wirte-combine(uncache acceleration) on mips is non dma-coherent.
> Then MIPS breaks the DMA allocation semantics. A buffer allocated with
> dma_alloc_wc is supposed to be coherent.
>
>> But on arm, It seem that wirte-combine is coherent. (guaranteed by arch
>> implement).
>>
>>
>> I also heard using dma_alloc_coherent  to allocation the buffer for the
>> non-coherent doesn't hurt, but the reverse is not true.
>>
>>
>> But please do not create confusion.
>>
>> software composite is faster because better cacheusing rate and
>>
>> cache is faster to read.
>>
>> It is faster because it is cached, not because it is non-coherent.
>>
>> non-coherent is arch thing and/or driver-side thing,
>>
>> it is a side effect of  using the cached mapping.
> Honestly, it's not clear to me what your point or issue is.
>
> Going back to the description in your commit log, you mention that you
> want to support multiple hardware that might or might not be DMA
> coherent, and thus you want to allocate a buffer with different
> attributes depending on that?
>
> Like, you say that the LS3A4000 has a coherency unit and thus doing the
> allocation with dma_alloc_coherent is preferred. Preferred to what? A WC
> buffer? Why?
>
> A WC buffer is a coherent buffer that is allowed to cache writes.
>
> It doesn't have to, and worst case scenario you're inexactly the same
> case than a dma_alloc_coherent buffer.
>
>> It should left to driver to handle such a side effect. The device
>> driver know their device, so its the device driver's responsibility to
>> maintain the coherency.
> Not really, no. Some driver are used across multiple SoCs and multiple
> arch. It doesn't make any sense to encode this in the driver... which is
> why it's in the DT in the first place, and abstracted away by the DMA
> API. Like, do you really expect the amdgpu driver to know the DMA
> attributes it needs to allocate a buffer from when running from a
> RaspberryPi?
>
>> On loongson platform, we don't need to call
>> drm_fb_dma_sync_non_coherent() function, Its already guaranteed by
>> hardware.
> And mostly guaranteed by dma_alloc_coherent. And if you wanted to call
> it anyway, it would be a nop if the device is declared as coherent
> already.
>
> I think you're thinking about this backward. A buffer has mapping
> attributes, and a device has hardware properties.
>
> The driver (ie, software) will allocate a buffer with some mapping
> attributes, and will assume that they are met in the rest of its code.
> How they are met is an implementation detail of the hardware, and for
> all the driver cares, it doesn't have to match.
>
> You can allocate a WC buffer to use on a non-coherent device and that's
> fine. You can allocate a non-coherent buffer on a coherent device and
> that's fine too. The DMA API will make everything work when it needs to,
> and if the hardware already provides stronger guarantees, then it will
> just skip whatever is redundant.
>
> So you need to write your driver using buffer is the most convenient for
> you, and it's really all that matters at the driver level. But for that
> to work, you need to flag the coherence-ness of your devices properly,
> like Paul suggested.

I seems that you are right, at least at ARM world.

Thanks for you tell me this, I might be wrong, this patch may have small 
problems.

I think I should take more time to investigate this problem.

But I need to do more test before I can reply , thank Paul also.

> Maxime

-- 
Jingfeng



More information about the dri-devel mailing list