[Linaro-mm-sig] [PATCH v5 1/6] dma-buf: Add dma_buf_{begin,end}_access()

Christian König christian.koenig at amd.com
Tue Jan 23 13:28:10 UTC 2024


Am 23.01.24 um 14:02 schrieb Paul Cercueil:
> [SNIP]
>>>> That an exporter has to call extra functions to access his own
>>>> buffers
>>>> is a complete no-go for the design since this forces exporters
>>>> into
>>>> doing extra steps for allowing importers to access their data.
>>> Then what about we add these dma_buf_{begin,end}_access(), with
>>> only
>>> implementations for "dumb" exporters e.g. udmabuf or the dmabuf
>>> heaps?
>>> And only importers (who cache the mapping and actually care about
>>> non-
>>> coherency) would have to call these.
>> No, the problem is still that you would have to change all importers
>> to
>> mandatory use dma_buf_begin/end.
>>
>> But going a step back caching the mapping is irrelevant for
>> coherency.
>> Even if you don't cache the mapping you don't get coherency.
> You actually do - at least with udmabuf, as in that case
> dma_buf_map_attachment() / dma_buf_unmap_attachment() will handle cache
> coherency when the SGs are mapped/unmapped.

Well I just double checked the source in 6.7.1 and I can't see udmabuf 
doing anything for cache coherency in map/unmap.

All it does is calling dma_map_sgtable() and dma_unmap_sgtable() to 
create and destroy the SG table and those are not supposed to sync 
anything to the CPU cache.

In other words drivers usually use DMA_ATTR_SKIP_CPU_SYNC here, it's 
just that this is missing from udmabuf.

> The problem was then that dma_buf_unmap_attachment cannot be called
> before the dma_fence is signaled, and calling it after is already too
> late (because the fence would be signaled before the data is sync'd).

Well what sync are you talking about? CPU sync? In DMA-buf that is 
handled differently.

For importers it's mandatory that they can be coherent with the 
exporter. That usually means they can snoop the CPU cache if the 
exporter can snoop the CPU cache.

For exporters you can implement the begin/end CPU access functions which 
allows you to implement something even if your exporting device can't 
snoop the CPU cache.

> Daniel / Sima suggested then that I cache the mapping and add new
> functions to ensure cache coherency, which is what these patches are
> about.

Yeah, I've now catched up on the latest mail. Sorry I haven't seen that 
before.

>
>> In other words exporters are not require to call sync_to_cpu or
>> sync_to_device when you create a mapping.
>>
>> What exactly is your use case here? And why does coherency matters?
> My use-case is, I create DMABUFs with udmabuf, that I attach to
> USB/functionfs with the interface introduced by this patchset. I attach
> them to IIO with a similar interface (being upstreamed in parallel),
> and transfer data from USB to IIO and vice-versa in a zero-copy
> fashion.
>
> This works perfectly fine as long as the USB and IIO hardware are
> coherent between themselves, which is the case on most of our boards.
> However I do have a board (with a Xilinx Ultrascale SoC) where it is
> not the case, and cache flushes/sync are needed. So I was trying to
> rework these new interfaces to work on that system too.

Yeah, that sounds strongly like one of the use cases we have rejected so 
far.

> If this really is a no-no, then I am fine with the assumption that
> devices sharing a DMABUF must be coherent between themselves; but
> that's something that should probably be enforced rather than assumed.
>
> (and I *think* there is a way to force coherency in the Ultrascale's
> interconnect - we're investigating it)

What you can do is that instead of using udmabuf or dma-heaps is that 
the device which can't provide coherency act as exporters of the buffers.

The exporter is allowed to call sync_for_cpu/sync_for_device on it's own 
buffers and also gets begin/end CPU access notfications. So you can then 
handle coherency between the exporter and the CPU.

If you really don't have coherency between devices then that would be a 
really new use case and we would need much more agreement on how to do this.

Regards,
Christian.

>
> Cheers,
> -Paul
>
>>> At the very least, is there a way to check that "the data can be
>>> accessed coherently by the involved devices"? So that my importer
>>> can
>>> EPERM if there is no coherency vs. a device that's already
>>> attached.
>> Yeah, there is functionality for this in the DMA subsystem. I've once
>> created prototype patches for enforcing the same coherency approach
>> between importer and exporter, but we never got around to upstream
>> them.
>>
>>
>>
>>> Cheers,
>>> -Paul
>>>
>>>> That in turn is pretty much un-testable unless you have every
>>>> possible
>>>> importer around while testing the exporter.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>> Cheers,
>>>>> -Paul
>>>>>
>>>>>>> Signed-off-by: Paul Cercueil<paul at crapouillou.net>
>>>>>>>
>>>>>>> ---
>>>>>>> v5: New patch
>>>>>>> ---
>>>>>>>      drivers/dma-buf/dma-buf.c | 66
>>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>>>      include/linux/dma-buf.h   | 37 ++++++++++++++++++++++
>>>>>>>      2 files changed, 103 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-
>>>>>>> buf/dma-
>>>>>>> buf.c
>>>>>>> index 8fe5aa67b167..a8bab6c18fcd 100644
>>>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>>>> @@ -830,6 +830,8 @@ static struct sg_table *
>>>>>>> __map_dma_buf(struct
>>>>>>> dma_buf_attachment *attach,
>>>>>>>       *     - dma_buf_mmap()
>>>>>>>       *     - dma_buf_begin_cpu_access()
>>>>>>>       *     - dma_buf_end_cpu_access()
>>>>>>> + *     - dma_buf_begin_access()
>>>>>>> + *     - dma_buf_end_access()
>>>>>>>       *     - dma_buf_map_attachment_unlocked()
>>>>>>>       *     - dma_buf_unmap_attachment_unlocked()
>>>>>>>       *     - dma_buf_vmap_unlocked()
>>>>>>> @@ -1602,6 +1604,70 @@ void dma_buf_vunmap_unlocked(struct
>>>>>>> dma_buf
>>>>>>> *dmabuf, struct iosys_map *map)
>>>>>>>      }
>>>>>>>      EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF);
>>>>>>>      
>>>>>>> +/**
>>>>>>> + * @dma_buf_begin_access - Call before any hardware access
>>>>>>> from/to
>>>>>>> the DMABUF
>>>>>>> + * @attach:	[in]	attachment used for hardware
>>>>>>> access
>>>>>>> + * @sg_table:	[in]	scatterlist used for the DMA
>>>>>>> transfer
>>>>>>> + * @direction:  [in]    direction of DMA transfer
>>>>>>> + */
>>>>>>> +int dma_buf_begin_access(struct dma_buf_attachment
>>>>>>> *attach,
>>>>>>> +			 struct sg_table *sgt, enum
>>>>>>> dma_data_direction dir)
>>>>>>> +{
>>>>>>> +	struct dma_buf *dmabuf;
>>>>>>> +	bool cookie;
>>>>>>> +	int ret;
>>>>>>> +
>>>>>>> +	if (WARN_ON(!attach))
>>>>>>> +		return -EINVAL;
>>>>>>> +
>>>>>>> +	dmabuf = attach->dmabuf;
>>>>>>> +
>>>>>>> +	if (!dmabuf->ops->begin_access)
>>>>>>> +		return 0;
>>>>>>> +
>>>>>>> +	cookie = dma_fence_begin_signalling();
>>>>>>> +	ret = dmabuf->ops->begin_access(attach, sgt, dir);
>>>>>>> +	dma_fence_end_signalling(cookie);
>>>>>>> +
>>>>>>> +	if (WARN_ON_ONCE(ret))
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_NS_GPL(dma_buf_begin_access, DMA_BUF);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * @dma_buf_end_access - Call after any hardware access
>>>>>>> from/to
>>>>>>> the DMABUF
>>>>>>> + * @attach:	[in]	attachment used for hardware
>>>>>>> access
>>>>>>> + * @sg_table:	[in]	scatterlist used for the DMA
>>>>>>> transfer
>>>>>>> + * @direction:  [in]    direction of DMA transfer
>>>>>>> + */
>>>>>>> +int dma_buf_end_access(struct dma_buf_attachment *attach,
>>>>>>> +		       struct sg_table *sgt, enum
>>>>>>> dma_data_direction dir)
>>>>>>> +{
>>>>>>> +	struct dma_buf *dmabuf;
>>>>>>> +	bool cookie;
>>>>>>> +	int ret;
>>>>>>> +
>>>>>>> +	if (WARN_ON(!attach))
>>>>>>> +		return -EINVAL;
>>>>>>> +
>>>>>>> +	dmabuf = attach->dmabuf;
>>>>>>> +
>>>>>>> +	if (!dmabuf->ops->end_access)
>>>>>>> +		return 0;
>>>>>>> +
>>>>>>> +	cookie = dma_fence_begin_signalling();
>>>>>>> +	ret = dmabuf->ops->end_access(attach, sgt, dir);
>>>>>>> +	dma_fence_end_signalling(cookie);
>>>>>>> +
>>>>>>> +	if (WARN_ON_ONCE(ret))
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_NS_GPL(dma_buf_end_access, DMA_BUF);
>>>>>>> +
>>>>>>>      #ifdef CONFIG_DEBUG_FS
>>>>>>>      static int dma_buf_debug_show(struct seq_file *s, void
>>>>>>> *unused)
>>>>>>>      {
>>>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-
>>>>>>> buf.h
>>>>>>> index 8ff4add71f88..8ba612c7cc16 100644
>>>>>>> --- a/include/linux/dma-buf.h
>>>>>>> +++ b/include/linux/dma-buf.h
>>>>>>> @@ -246,6 +246,38 @@ struct dma_buf_ops {
>>>>>>>      	 */
>>>>>>>      	int (*end_cpu_access)(struct dma_buf *, enum
>>>>>>> dma_data_direction);
>>>>>>>      
>>>>>>> +	/**
>>>>>>> +	 * @begin_access:
>>>>>>> +	 *
>>>>>>> +	 * This is called from dma_buf_begin_access() when
>>>>>>> a
>>>>>>> device driver
>>>>>>> +	 * wants to access the data of the DMABUF. The
>>>>>>> exporter
>>>>>>> can use this
>>>>>>> +	 * to flush/sync the caches if needed.
>>>>>>> +	 *
>>>>>>> +	 * This callback is optional.
>>>>>>> +	 *
>>>>>>> +	 * Returns:
>>>>>>> +	 *
>>>>>>> +	 * 0 on success or a negative error code on
>>>>>>> failure.
>>>>>>> +	 */
>>>>>>> +	int (*begin_access)(struct dma_buf_attachment *,
>>>>>>> struct
>>>>>>> sg_table *,
>>>>>>> +			    enum dma_data_direction);
>>>>>>> +
>>>>>>> +	/**
>>>>>>> +	 * @end_access:
>>>>>>> +	 *
>>>>>>> +	 * This is called from dma_buf_end_access() when a
>>>>>>> device
>>>>>>> driver is
>>>>>>> +	 * done accessing the data of the DMABUF. The
>>>>>>> exporter
>>>>>>> can
>>>>>>> use this
>>>>>>> +	 * to flush/sync the caches if needed.
>>>>>>> +	 *
>>>>>>> +	 * This callback is optional.
>>>>>>> +	 *
>>>>>>> +	 * Returns:
>>>>>>> +	 *
>>>>>>> +	 * 0 on success or a negative error code on
>>>>>>> failure.
>>>>>>> +	 */
>>>>>>> +	int (*end_access)(struct dma_buf_attachment *,
>>>>>>> struct
>>>>>>> sg_table *,
>>>>>>> +			  enum dma_data_direction);
>>>>>>> +
>>>>>>>      	/**
>>>>>>>      	 * @mmap:
>>>>>>>      	 *
>>>>>>> @@ -606,6 +638,11 @@ void dma_buf_detach(struct dma_buf
>>>>>>> *dmabuf,
>>>>>>>      int dma_buf_pin(struct dma_buf_attachment *attach);
>>>>>>>      void dma_buf_unpin(struct dma_buf_attachment *attach);
>>>>>>>      
>>>>>>> +int dma_buf_begin_access(struct dma_buf_attachment
>>>>>>> *attach,
>>>>>>> +			 struct sg_table *sgt, enum
>>>>>>> dma_data_direction dir);
>>>>>>> +int dma_buf_end_access(struct dma_buf_attachment *attach,
>>>>>>> +		       struct sg_table *sgt, enum
>>>>>>> dma_data_direction dir);
>>>>>>> +
>>>>>>>      struct dma_buf *dma_buf_export(const struct
>>>>>>> dma_buf_export_info
>>>>>>> *exp_info);
>>>>>>>      
>>>>>>>      int dma_buf_fd(struct dma_buf *dmabuf, int flags);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20240123/038a3418/attachment-0001.htm>


More information about the dri-devel mailing list