[PATCH v6 2/5] dma-buf: heaps: Add heap helpers

Wed Jul 24 15:20:31 UTC 2019

On 7/24/19 2:55 AM, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 01:09:55PM -0700, Rob Clark wrote:
>> On Mon, Jul 22, 2019 at 9:09 PM John Stultz <john.stultz at linaro.org> wrote:
>>>
>>> On Thu, Jul 18, 2019 at 3:06 AM Christoph Hellwig <hch at infradead.org> wrote:
>>>>
>>>> Is there any exlusion between mmap / vmap and the device accessing
>>>> the data?  Without that you are going to run into a lot of coherency
>>>> problems.
>>
>> dma_fence is basically the way to handle exclusion between different
>> device access (since device access tends to be asynchronous).  For
>> device<->device access, each driver is expected to take care of any
>> cache(s) that the device might have.  (Ie. device writing to buffer
>> should flush it's caches if needed before signalling fence to let
>> reading device know that it is safe to read, etc.)
>>
>> _begin/end_cpu_access() is intended to be the exclusion for CPU access
>> (which is synchronous)
> 
> What I mean is that we need a clear state machine (preferably including
> ownership tracking ala dma-debug) where a piece of memory has one
> owner at a time that can access it.  Only the owner can access is at
> that time, and at any owner switch we need to flush/invalidate all
> relevant caches.  And with memory that is vmaped and mapped to userspace
> that can get really complicated.
> 
> The above sounds like you have some of that in place, but we'll really
> need clear rules to make sure we don't have holes in the scheme.
> 

Well then lets think on this. A given buffer can have 3 owners states
(CPU-owned, Device-owned, and Un-owned). These are based on the caching
state from the CPU perspective.

If a buffer is CPU-owned then we (Linux) can write to the buffer safely
without worry that the data is stale or that it will be accessed by the
device without having been flushed. Device-owned buffers should not be
accessed by the CPU, and inter-device synchronization should be handled
by fencing as Rob points out. Un-owned is how a buffer starts for
consistency and to prevent unneeded cache operations on unwritten buffers.

We also need to track the mapping states, 4 states for this, CPU-mapped,
Device-mapped, CPU/Device-mapped, unmapped. Should be self explanatory,
map_dma_buf maps towards the device, mmap/vmap/kmap towards the CPU.
Leaving a buffer mapped by the CPU while device access takes place is
safe as long as ownership is taken before any access. One more point, we
assume reference counting for the below discussion, for instance
unmap_dma_buf refers to the last device unmapping, map_dma_buf refers
only to the first.

This gives 12 combined states, if we assume a buffer will always be
owned when it has someone mapping it, either CPU or device or both, then
we can drop 3 states. If a buffer is only mapped into one space, then
that space owns it, this drops 2 cross-owned states. Lastly if not
mapped by either space then the buffer becomes un-owned (and the backing
memory can be freed or migrated as needed). Leaving us 5 valid states.

* Un-Owned Un-Mapped
* Device-Owned Device-Mapped
* Device-Owned CPU/Device-Mapped
* CPU-Owned CPU-Mapped
* CPU-Owned CPU/Device-Mapped

There are 6 DMA-BUF operations (classes) on a buffer:

* map_dma_buf
* unmap_dma_buf
* begin_cpu_access
* end_cpu_access
* mmap/vmap/kmap
* ummanp/vunmap/kunmap

>From all this I've suggest the following state-machine(in DOT language):

Note: Buffers start in "Un-Owned Un-Mapped" and can only be freed from
that state.

Note: Commented out states/transitions are not valid but here to prove
completeness

-------------------------------------------------------------------

digraph dma_buf_buffer_states
{
	label = "DMA-BUF Buffer states";

	uo_um [ label="Un-Owned\nUn-Mapped" ];
//	uo_dm [ label="Un-Owned\nDevice-Mapped" ];
//	uo_cm [ label="Un-Owned\nCPU-Mapped" ];
//	uo_cdm [ label="Un-Owned\nCPU/Device-Mapped" ];

//	do_um [ label="Device-Owned\nUn-Mapped" ];
	do_dm [ label="Device-Owned\nDevice-Mapped" ];
//	do_cm [ label="Device-Owned\nCPU-Mapped" ];
	do_cdm [ label="Device-Owned\nCPU/Device-Mapped" ];

//	co_um [ label="CPU-Owned\nUn-Mapped" ];
//	co_dm [ label="CPU-Owned\nDevice-Mapped" ];
	co_cm [ label="CPU-Owned\nCPU-Mapped" ];
	co_cdm [ label="CPU-Owned\nCPU/Device-Mapped" ];

	/* From Un-Owned Un-Mapped */
		uo_um -> do_dm		[ label="map_dma_buf" ];
//		uo_um ->		[ label="unmap_dma_buf" ];
//		uo_um -> 		[ label="begin_cpu_access" ];
//		uo_um ->		[ label="end_cpu_access" ];
		uo_um -> co_cm		[ label="mmap/vmap/kmap" ];
//		uo_um -> 		[ label="ummanp/vunmap/kunmap" ];

	/* From Device-Owned Device-Mapped */
		do_dm -> do_dm		[ label="map_dma_buf" ];
		do_dm -> uo_um		[ label="unmap_dma_buf" ];
//		do_dm -> 		[ label="begin_cpu_access" ];
//		do_dm ->		[ label="end_cpu_access" ];
		do_dm -> do_cdm		[ label="mmap/vmap/kmap" ];
//		do_dm -> 		[ label="ummanp/vunmap/kunmap" ];

	/* From Device-Owned CPU/Device-Mapped */
		do_cdm -> do_cdm	[ label="map_dma_buf" ];
		do_cdm -> co_cm		[ label="unmap_dma_buf" ];
		do_cdm -> co_cdm	[ label="begin_cpu_access" ];
//		do_cdm ->		[ label="end_cpu_access" ];
		do_cdm -> do_cdm	[ label="mmap/vmap/kmap" ];
		do_cdm -> do_dm		[ label="ummanp/vunmap/kunmap" ];

	/* From CPU-Owned CPU-Mapped */
		co_cm -> co_cdm		[ label="map_dma_buf" ];
//		co_cm -> 		[ label="unmap_dma_buf" ];
//		co_cm -> 		[ label="begin_cpu_access" ];
		co_cm -> co_cm		[ label="end_cpu_access" ];
//		co_cm ->		[ label="mmap/vmap/kmap" ];
		co_cm -> uo_um		[ label="ummanp/vunmap/kunmap" ];

	/* From CPU-Owned CPU/Device-Mapped */
		co_cdm -> co_cdm	[ label="map_dma_buf" ];
		co_cdm -> co_cm		[ label="unmap_dma_buf" ];
//		co_cdm -> 		[ label="begin_cpu_access" ];
		co_cdm -> do_cdm	[ label="end_cpu_access" ];
		co_cdm -> co_cdm	[ label="mmap/vmap/kmap" ];
//		co_cdm ->		[ label="ummanp/vunmap/kunmap" ];

	{
		rank = same;
		co_cm -> do_dm [ style=invis ];
		rankdir = LR;
	}

	{
		rank = same;
		co_cdm -> do_cdm [ style=invis ];
		rankdir = LR;
	}
}

-------------------------------------------------------------------

If we consider this the "official" model, then we can start optimizing
cache operations, and start forbidding some nonsensical operations.

What do y'all think?

Andrew