dma-buf non-coherent mmap

Mon Nov 4 11:48:38 CET 2013

On 11/04/2013 11:22 AM, Lucas Stach wrote:
> Hi Thomas,
>
> I inlined the patch, to discuss more easily.
>
> Am Montag, den 04.11.2013, 08:53 +0100 schrieb Thomas Hellstrom:
>> enum dma_buf_data_direction {
>>          DMA_BUF_BIDIRECTIONAL,
>>          DMA_BUF_TO_DEVICE,
>>          DMA_BUF_FROM_DEVICE
>> };
> I don't think the DMA API semantic makes much sense here. Doing a sync
> from device is at least a bit complicated, as we would have to track
> down the device in which domain the buffer is currently residing for
> write. This may be doable by bouncing the sync op from the exporter to
> the right attachment.
>
> The other way around is even more messier: a DMA-BUF is by definition
> shared between multiple devices, so which of them should you sync to?
> All of them?
>
> I think the API used for userspace access should only be concerned with
> making sure the CPU gets a coherent view to the resource at the point of
> sharing i.e. sysmem. All DMA-BUF users should make sure that they sync
> their writes to the point of sharing strictly before signaling that they
> are done with the buffer.
> Maybe this is just a naming problem, but IMHO the API should make it
> really clear that the sync only makes sure that the data reaches the
> point of sharing.

Good point. I think the options need to remain, but we could rename
DEVICE to STORAGE, or whatever In our case,

sync_for_cpu(DMA_BUF_TO_STORAGE) would be a no-op, whereas,
sync_for_cpu(DMA_BUF_BIDIRECTIONAL) would trigger a DMA read.

>
>> /**
>>   * struct dma_buf_sync_arg - Argument to
>>   *
>>   * @start_x:   Upper left X coordinate of area to be synced.
>>   * Unit is format-dependent
>>   * @start_y:   Upper left Y coordinate of area to be synced.
>>   * Unit is format-dependent.
>>   * @width:     Width of area to be synced. Grows to the right.
>>   * Unit is format-dependent.
>>   * @height:    Height of area to be synced. Grows downwards.
>>   * Unit is format-dependent.
> While I see why you would need this I don't really like the concept of a
> two dimensional resource pushed into the API. You already need the
> linear flag to make this work with 1D resources and I don't see what
> happens when you encounter 3D or array resources.
>
> Don't take this as a strong rejection as I see why this is useful, but I
> don't like how the interface looks like right now.

I understand. I don't think 1D only syncing options would suffice 
performance-wise, as
the use-case I'm most afraid of is people trying to share 2D contents 
between user-space
processes.

What do you think about supplying descriptor objects that could either 
describe a
linear area, 2D area or whatever? Whenever (if) a new descriptor type is 
added to the interface we could
have a legacy implementation that implements it in terms of 1D and 2D syncs.

>
>>   * @direction: Intended transfer direction of data.
>>   * @flags:     Flags to tune the synchronizing behaviour.
>>   */
>>
>> struct dma_buf_sync_region_arg {
>>          __u32 buf_identifier;
>>          __u32 start_x;
>>          __u32 start_y;
>>          __u32 width;
>>          __u32 height;
>>          enum dma_buf_data_direction direction;
>>          __u32 flags;
>>          __u32 pad;
>> };
>>          
>> /**
>>   * Force sync any outstanding rendering of the exporter before
>> returning.
>>   * This is strictly a performance hint. The user may omit this flag to
>>   * speed up execution of the call, if it is known that previous
>> rendering
>>   * affecting (by read or write) the synchronized area has already
>> finished.
>>   */
>> #define DMA_BUF_SYNC_FLAG_SYNC  (1 << 0);
>>
>> /**
>>   * Treat @start_x as byte offset into buffer and @width as byte
>>   * synchronization length. The values of @start_y and @height are
>> ignored.
>>   * (Separate ioctl?)
>>   */
>> #define DMA_BUF_SYNC_FLAG_LINEAR (1 << 1);
>>
>> /**
>>   * Allow the implementation to coalesce sync_for_device calls, until
>> either
>>   * a) An explicit flush
>>   * b) A sync for cpu call with DMA_BUF_BIDIRECTIONAL or
>> DMA_BUF_TO_DEVICE
>>   *
>>   * Note: An implementation may choose to ignore this flag.
>>   */
>> #define DMA_BUF_SYNC_FLAG_COALESCE (1 << 2);
>>
>> /**
>>   * IOCTLS-
>>   *
>>   * Kernel waits should, if possible, be performed interruptible, and
>> the
>>   * ioctl may sett errno to EINTR if the ioctl needs to be restarted.
>>   * To be discussed: Any sync operation may not affect areas outside
>> the
>>   * region indicated. (Good for vmwgfx, but plays ill with cache line
>> alignment)
>>   */
>>
>> /**
>>   * Sync for CPU.
>>   */
>> #define DMA_BUF_SYNC_REGION_FOR_CPU             \
>> _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync_region_arg)
>>
>> /**
>>   * Sync for device. This is the default state of a dma-buf.
>>   */
>> #define DMA_BUF_SYNC_REGION_FOR_DEVICE                          \
>> _IOW(DMA_BUF_BASE, 1, struct dma_buf_sync_region_arg)
>>
>> /**
>>   * Flush any coalesced SYNC_REGION_FOR_DEVICE
>>   */
>> #define DMA_BUF_SYNC_REGION_FLUSH               \
>> _IOW(DMA_BUF_BASE, 2, __u32)
>>
> What is the use-case for this? Why don't you think that usespace could
> be smart enough to coalesce the flush on its own?

Right. A thinko. We could clearly leave that out.

/Thomas