[RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

Fri Jun 21 18:36:21 PDT 2013

2013/6/22 Jerome Glisse <j.glisse at gmail.com>:
> On Fri, Jun 21, 2013 at 12:55 PM, Inki Dae <daeinki at gmail.com> wrote:
>> 2013/6/21 Lucas Stach <l.stach at pengutronix.de>:
>>> Hi Inki,
>>>
>>> please refrain from sending HTML Mails, it makes proper quoting without
>>> messing up the layout everywhere pretty hard.
>>>
>>
>> Sorry about that. I should have used text mode.
>>
>>> Am Freitag, den 21.06.2013, 20:01 +0900 schrieb Inki Dae:
>>> [...]
>>>
>>>> Yeah, you'll some knowledge and understanding about the API
>>>> you are
>>>> working with to get things right. But I think it's not an
>>>> unreasonable
>>>> thing to expect the programmer working directly with kernel
>>>> interfaces
>>>> to read up on how things work.
>>>>
>>>> Second thing: I'll rather have *one* consistent API for every
>>>> subsystem,
>>>> even if they differ from each other than having to implement
>>>> this
>>>> syncpoint thing in every subsystem. Remember: a single execbuf
>>>> in DRM
>>>> might reference both GEM objects backed by dma-buf as well
>>>> native SHM or
>>>> CMA backed objects. The dma-buf-mgr proposal already allows
>>>> you to
>>>> handle dma-bufs much the same way during validation than
>>>> native GEM
>>>> objects.
>>>>
>>>> Actually, at first I had implemented a fence helper framework based on
>>>> reservation and dma fence to provide easy-use-interface for device
>>>> drivers. However, that was wrong implemention: I had not only
>>>> customized the dma fence but also not considered dead lock issue.
>>>> After that, I have reimplemented it as dmabuf sync to solve dead
>>>> issue, and at that time, I realized that we first need to concentrate
>>>> on the most basic thing: the fact CPU and CPU, CPU and DMA, or DMA and
>>>> DMA can access a same buffer, And the fact simple is the best, and the
>>>> fact we need not only kernel side but also user side interfaces. After
>>>> that, I collected what is the common part for all subsystems, and I
>>>> have devised this dmabuf sync framework for it. I'm not really
>>>> specialist in Desktop world. So question. isn't the execbuf used only
>>>> for the GPU? the gpu has dedicated video memory(VRAM) so it needs
>>>> migration mechanism between system memory and the dedicated video
>>>> memory, and also to consider ordering issue while be migrated.
>>>>
>>>
>>> Yeah, execbuf is pretty GPU specific, but I don't see how this matters
>>> for this discussion. Also I don't see a big difference between embedded
>>> and desktop GPUs. Buffer migration is more of a detail here. Both take
>>> command stream that potentially reference other buffers, which might be
>>> native GEM or dma-buf backed objects. Both have to make sure the buffers
>>> are in the right domain (caches cleaned and address mappings set up) and
>>> are available for the desired operation, meaning you have to sync with
>>> other DMA engines and maybe also with CPU.
>>
>> Yeah, right. Then, in case of desktop gpu, does't it need additional
>> something to do when a buffer/s is/are migrated from system memory to
>> video memory domain, or from video memory to system memory domain? I
>> guess the below members does similar thing, and all other DMA devices
>> would not need them:
>> struct fence {
>> ...
>> unsigned int context, seqno;
>> ...
>> };
>>
>> And,
>> struct seqno_fence {
>> ...
>> uint32_t seqno_ofs;
>> ...
>> };
>>
>>>
>>> The only case where sync isn't clearly defined right now by the current
>>> API entrypoints is when you access memory through the dma-buf fallback
>>> mmap support, which might happen with some software processing element
>>> in a video pipeline or something. I agree that we will need a userspace
>>> interface here, but I think this shouldn't be yet another sync object,
>>> but rather more a prepare/fini_cpu_access ioctl on the dma-buf which
>>> hooks into the existing dma-fence and reservation stuff.
>>
>> I think we don't need addition ioctl commands for that. I am thinking
>> of using existing resources as possible. My idea also is similar in
>> using the reservation stuff to your idea because my approach also
>> should use the dma-buf resource. However, My idea is that a user
>> process, that wants buffer synchronization with the other, sees a sync
>> object as a file descriptor like dma-buf does. The below shows simple
>> my idea about it:
>>
>> ioctl(dmabuf_fd, DMA_BUF_IOC_OPEN_SYNC, &sync);
>>
>> flock(sync->fd, LOCK_SH); <- LOCK_SH means a shared lock.
>> CPU access for read
>> flock(sync->fd, LOCK_UN);
>>
>> Or
>>
>> flock(sync->fd, LOCK_EX); <- LOCK_EX means an exclusive lock
>> CPU access for write
>> flock(sync->fd, LOCK_UN);
>>
>> close(sync->fd);
>>
>> As you know, that's similar to dmabuf export feature.
>>
>> In addition, a more simple idea,
>> flock(dmabuf_fd, LOCK_SH/EX);
>> CPU access for read/write
>> flock(dmabuf_fd, LOCK_UN);
>>
>> However, I'm not sure that the above examples could be worked well,
>> and there are no problems yet: actually, I don't fully understand
>> flock mechanism, so looking into it.
>>
>>>
>>>>
>>>> And to get back to my original point: if you have more than
>>>> one task
>>>> operating together on a buffer you absolutely need some kind
>>>> of real IPC
>>>> to sync them up and do something useful. Both you syncpoints
>>>> and the
>>>> proposed dma-fences only protect the buffer accesses to make
>>>> sure
>>>> different task don't stomp on each other. There is nothing in
>>>> there to
>>>> make sure that the output of your pipeline is valid. You have
>>>> to take
>>>> care of that yourself in userspace. I'll reuse your example to
>>>> make it
>>>> clear what I mean:
>>>>
>>>> Task A Task B
>>>> ------ -------
>>>> dma_buf_sync_lock(buf1)
>>>> CPU write buf1
>>>> dma_buf_sync_unlock(buf1)
>>>> ---------schedule Task A again-------
>>>> dma_buf_sync_lock(buf1)
>>>> CPU write buf1
>>>> dma_buf_sync_unlock(buf1)
>>>> ---------schedule Task B---------
>>>> qbuf(buf1)
>>>>
>>>> dma_buf_sync_lock(buf1)
>>>> ....
>>>>
>>>> This is what can happen if you don't take care of proper
>>>> syncing. Task A
>>>> writes something to the buffer in expectation that Task B will
>>>> take care
>>>> of it, but before Task B even gets scheduled Task A overwrites
>>>> the
>>>> buffer again. Not what you wanted, isn't it?
>>>>
>>>> Exactly wrong example. I had already mentioned about that. "In case
>>>> that data flow goes from A to B, it needs some kind of IPC between the
>>>> two tasks every time" So again, your example would have no any
>>>> problem in case that *two tasks share the same buffer but these tasks
>>>> access the buffer(buf1) as write, and data of the buffer(buf1) isn't
>>>> needed to be shared*. They just need to use the buffer as *storage*.
>>>> So All they want is to avoid stomping on the buffer in this case.
>>>>
>>> Sorry, but I don't see the point. If no one is interested in the data of
>>> the buffer, why are you sharing it in the first place?
>>>
>>
>> Just used as a storage. i.e., Task A fills the buffer with "AAAAAA"
>> using CPU, And Task B fills the buffer with "BBBBBB" using DMA. They
>> don't share data of the buffer, but they share *memory region* of the
>> buffer. That would be very useful for the embedded systems with very
>> small size system memory.
>
> Just so i understand. You want to share backing memory, you don't want
> to share content ie you want to do memory management in userspace.
> This sounds wrong on so many level (not even considering the security
> implication).
>
> If Task A need memory and then can release it for Task B usage that
> should be the role of kernel memory management which of course needs
> synchronization btw A and B. But in no case this should be done using
> dma-buf. dma-buf is for sharing content btw different devices not
> sharing resources.
>

Just simply let's think of the case that a user process doesn't want for
anyone, other CPU or DMA, to access a shared buffer while he is accessing
the shared buffer. In this case, useful.

Thanks,
Inki Dae

>
> Also don't over complicate the vram case, just consider desktop gpu as
> using system memory directly. They can do it and they do it. Migration
> to vram is orthogonal to all this, it's an optimization so to speak.
>
> Cheers,
> Jerome.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20130622/8fe34092/attachment-0001.html>