[Mesa-dev] [RFC PATCH] mesa: Export BOs in RW mode

Wed Jul 3 14:50:08 UTC 2019

On 03/07/2019 15:33, Boris Brezillon wrote:
> On Wed, 3 Jul 2019 15:13:25 +0100
> Steven Price <steven.price at arm.com> wrote:
> 
>> On 03/07/2019 14:56, Boris Brezillon wrote:
>>> On Wed, 3 Jul 2019 07:45:32 -0600
>>> Rob Herring <robh+dt at kernel.org> wrote:
>>>   
>>>> On Wed, Jul 3, 2019 at 7:34 AM Boris Brezillon
>>>> <boris.brezillon at collabora.com> wrote:  
>>>>>
>>>>> Exported BOs might be imported back, then mmap()-ed to be written
>>>>> too. Most drivers handle that by mmap()-ing the GEM handle after it's
>>>>> been imported, but, according to [1], this is illegal.    
>>>>
>>>> It's not illegal, but is supposed to go thru the dmabuf mmap
>>>> functions.  
>>>
>>> That's basically what I'm proposing here, just didn't post the patch
>>> skipping the GET_OFFSET step and doing the mmap() on the dmabuf FD
>>> instead of the DRM-node one, but I have it working for panfrost.  
>>
>> If we want to we could make the Panfrost kernel driver internally call
>> dma_buf_mmap() so that mapping using the DRM-node "just works". This is
>> indeed what the kbase driver does.
> 
> Well, userspace should at least skip DRM_IOCTL_PANFROST_MMAP_BO (or
> ignore its return code), so calling mmap() on the dmabuf FD instead of
> the DRM-node FD shouldn't be that hard.

What I was suggesting is that user space would still call
DRM_IOCTL_PANFROST_MMAP_BO to get an offset which uses in a call to
mmap(..., drm_node_fd, offset). The kernel could detect that the buffer
is imported and call the exporter for the actual mmap() functionality.

The alternative is that user space 'simply' remembers that a buffer is
imported and keeps the file descriptor around so that it can instead
directly mmap() the dma_buf fd. Which is certainly easiest from the
kernel's perspective (and was what I assumed panfrost was doing - I
should have checked more closely!).

>>>> However, none of the driver I've looked at (etnaviv, msm,
>>>> v3d, vgem) do that. It probably works because it's the same driver
>>>> doing the import and export or both drivers have essentially the same
>>>> implementations.  
>>>
>>> Yes, but maybe that's something we should start fixing if mmap()-ing
>>> the dmabuf is the recommended solution.  
>>
>> I'm open to options here. User space calling mmap() on the dma_buf file
>> descriptor should always be safe (the exporter can do whatever is
>> necessary to make it work). If that happens then the patches I posted
>> close off the DRM node version which could be broken if the exporter
>> needs to do anything to prepare the buffer for CPU access (i.e. cache
>> maintenance).
> 
> Talking about CPU <-> GPU syncs, I was wondering if the
> mmap(gem_handle) step was providing any guarantee that would
> allow us to ignore all the cache maintenance operations that are
> required when mmap()-ing a dmabuf directly. Note that in both cases the
> dmabuf is imported.

In theory the exporter should do whatever is required to ensure that the
CPU is synchronised when a user space mapping exists. There are some
issues here though:

* In theory the kernel driver should map the dma_buf purely for the
duration that a job is using the buffer (and unmap immediately after).
This gives the exporter the knowledge of when the GPU is using the
memory and allows the exporter to page out of the memory if necessary.
In practise this map/unmap operation is expensive (updating the GPU's
page tables) so most drivers don't actually bother and keep the memory
mapped. This means the exporter cannot tell when the buffer is used or
move the pages.

* The CPU mappings can be faulted on demand (performing the necessary
CPU cache invalidate if needed) and shot-down to allow moving the
memory. In theory when the GPU needs the memory it should map the buffer
and the exporter can then shoot down the mappings, perform the CPU cache
clean and then allow the GPU to use the memory. A subsequent CPU access
would then refault the page, ensuring a CPU cache invalidate so the
latest data is visible.

* The majority of exporters are simple and deal with uncached memory
(e.g. frame buffers) or are actually exporting back to the same driver
(e.g. window surfaces). In these situations either the driver already
has the necessary "magic" to deal with caches (e.g. kbase provides
explicit cache maintenance operations), or it's "uncached" anyway so it
doesn't matter. This means that hardly anyone tests with the complex
cases...

>From a user space ABI, my understanding is that a dma_buf mmap() mapping
should be coherent, and user space isn't expected to do anything to make
it work. Obviously any importing device might have it's own coherency
details which will be up to the ABI of that device (e.g. Mali has caches
which may need to be flushed - this is usually done at the start/end of
a job chain, so coherency is not guaranteed while the job chain is running).

Steve