[PATCH] drm: gem: add an option for supporting the dma-coherent hardware.

Maxime Ripard mripard at kernel.org
Thu Jun 8 07:39:45 UTC 2023


On Thu, Jun 08, 2023 at 01:18:38AM +0800, Sui Jingfeng wrote:
> Hi,
> 
> On 2023/6/8 00:12, Paul Cercueil wrote:
> > Hi Sui,
> > 
> > Le mercredi 07 juin 2023 à 22:38 +0800, Sui Jingfeng a écrit :
> > > Hi,  welcome to discussion.
> > > 
> > > 
> > > I have limited skills in manipulating English.
> > > 
> > > It may not express what I'm really means in the short time.
> > > 
> > > Part of word in the sentence may not as accurate as your.
> > > 
> > > Well, please don't misunderstand, I'm not doing the rude to you.
> > No problem.
> > 
> > > I will explain it with more details.
> > > 
> > > See below:
> > > 
> > > 
> > > On 2023/6/7 20:09, Paul Cercueil wrote:
> > > > Hi Sui,
> > > > 
> > > > Le mercredi 07 juin 2023 à 18:30 +0800, Sui Jingfeng a écrit :
> > > > > Hi,
> > > > > 
> > > > > 
> > > > > On 2023/6/7 17:36, Paul Cercueil wrote:
> > > > > > Hi Sui,
> > > > > > 
> > > > > > Le mercredi 07 juin 2023 à 13:30 +0800, Sui Jingfeng a écrit :
> > > > > > > The single map_noncoherent member of struct
> > > > > > > drm_gem_dma_object
> > > > > > > may
> > > > > > > not
> > > > > > > sufficient for describing the backing memory of the GEM
> > > > > > > buffer
> > > > > > > object.
> > > > > > > 
> > > > > > > Especially on dma-coherent systems, the backing memory is
> > > > > > > both
> > > > > > > cached
> > > > > > > coherent for multi-core CPUs and dma-coherent for peripheral
> > > > > > > device.
> > > > > > > Say architectures like X86-64, LoongArch64, Loongson Mips64,
> > > > > > > etc.
> > > > > > > 
> > > > > > > Whether a peripheral device is dma-coherent or not can be
> > > > > > > implementation-dependent. The single map_noncoherent option
> > > > > > > is
> > > > > > > not
> > > > > > > enough
> > > > > > > to reflect real hardware anymore. For example, the Loongson
> > > > > > > LS3A4000
> > > > > > > CPU
> > > > > > > and LS2K2000/LS2K1000 SoC, peripheral device of such hardware
> > > > > > > platform
> > > > > > > allways snoop CPU's cache. Doing the allocation with
> > > > > > > dma_alloc_coherent
> > > > > > > function is preferred. The return buffer is cached, it should
> > > > > > > not
> > > > > > > using
> > > > > > > the default write-combine mapping. While with the current
> > > > > > > implement,
> > > > > > > there
> > > > > > > no way to tell the drm core to reflect this.
> > > > > > > 
> > > > > > > This patch adds cached and coherent members to struct
> > > > > > > drm_gem_dma_object.
> > > > > > > which allow driver implements to inform the core. Introducing
> > > > > > > new
> > > > > > > mappings
> > > > > > > while keeping the original default behavior unchanged.
> > > > > > Did you try to simply set the "dma-coherent" property to the
> > > > > > device's
> > > > > > node?
> > > > > But this approach can only be applied for the device driver with
> > > > > DT
> > > > > support.
> > > > > 
> > > > > X86-64, Loongson ls3a4000 mips64, Loongson ls3a5000 CPU typically
> > > > > do
> > > > > not
> > > > > have DT support.
> > > > > 
> > > > > They using ACPI to pass parameter from the firmware to Linux
> > > > > kernel.
> > > > > 
> > > > > You approach will lost the effectiveness on such a case.
> > > > Well, I don't really know how ACPI handles it - but it should just
> > > > be a
> > > > matter of setting dev->dma_coherent. That's basically what the DT
> > > > code
> > > > does.
> > > > 
> > > > Some MIPS boards set it in their setup code for instance.
> > > > 
> > > This is a *strategy*, not a *mechanism*.
> > > 
> > > In this case, DT is just used to describing the hardware.
> > > 
> > > (It is actually a hardware feature describing language, the
> > > granularity
> > > is large)
> > > 
> > > It does not changing the state of the hardware.
> > > 
> > > It's your platform firmware or kernel setting up code who actually do
> > > such a things.
> > > 
> > > 
> > > It's just that it works on *one* platform, it does not guarantee it
> > > will
> > > works on others.
> > If you add the "dma-coherent" property in a device node in DT, you
> > effectively specify that the device is DMA-coherent; so you describe
> > the hardware, which is what DT is for, and you are not changing the
> > state of the hardware.
> > 
> > Note that some MIPS platforms (arch/mips/alchemy/common/setup.c)
> > default to DMA-coherent mapping; I believe you could do something
> > similar with your Loongson LS3A4000 CPU and LS2K2000/LS2K1000 SoC.
> > 
> The preblem is that device driver can have various demand.
> 
> It probably want to create different kind of buffers for different thing
> simultaneously.
> 
> Say, one allocated with dma_alloc_coherent for command buffer or dma
> descriptor
> 
> another one allocated with  dma_alloc_wc for uploading shader etc.
> 
> also has the third one allocated with dma_alloc_noncoherent() for doing some
> else.

And it will work just fine.

struct device dma_coherent, or DT's dma-coherent property define that
the device doesn't need any kind of cache maintenance, ever. If it's
missing, we need to perform cache maintenance to keep coherency.

dma_alloc_* functions provide guarantees to the driver. With
dma_alloc_wc and dma_alloc_coherent, the buffer is coherent, and thus
you don't need to perform cache maintenance operations by hand in the
driver.

With dma_alloc_noncoherent, the buffer is non-coherent and the driver
needs to perform them when relevant.

How those buffers are created is platform specific, but the guarantees
provided *to the driver* are always there.

A buffer allocated with dma_alloc_coherent might be provided by
different means (at the hardware level with a coherency unit, by mapping
it non-cacheable), but as far as the driver is concerned it's always
going to be coherent.

Similarly, a driver using dma_alloc_noncoherent will always require
cache maintenance operations to use the API properly, even if the
hardware provides coherency (in which case, those operations will be
nop).

So, yeah, like I was saying in the other mail, it looks like you're
confusing a bunch of things. dma_alloc_* functions are about the driver
expectations and guarantees. DT's dma-coherent property is about how we
can implement them on a given platform.

They don't have to match, and that's precisely how we can have drivers
that run on any combination of platforms: the driver only cares about
the buffer guarantees, the platform description takes care of how they
are implemented.

Maxime
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20230608/8b499ac2/attachment.sig>


More information about the dri-devel mailing list