[Mesa-dev] [RFC PATCH] mesa: Export BOs in RW mode
Daniel Vetter
daniel at ffwll.ch
Thu Jul 4 09:46:42 UTC 2019
On Thu, Jul 4, 2019 at 11:26 AM Steven Price <steven.price at arm.com> wrote:
>
> On 03/07/2019 17:18, Daniel Vetter wrote:
> > On Wed, Jul 3, 2019 at 6:11 PM Steven Price <steven.price at arm.com> wrote:
> [...]
> >> In theory the exporter should do whatever is required to ensure that the
> >> CPU is synchronised when a user space mapping exists. There are some
> >> issues here though:
> >>
> >> * In theory the kernel driver should map the dma_buf purely for the
> >> duration that a job is using the buffer (and unmap immediately after).
> >> This gives the exporter the knowledge of when the GPU is using the
> >> memory and allows the exporter to page out of the memory if necessary.
> >> In practise this map/unmap operation is expensive (updating the GPU's
> >> page tables) so most drivers don't actually bother and keep the memory
> >> mapped. This means the exporter cannot tell when the buffer is used or
> >> move the pages.
> >
> > Yeah refaulting is too slow if you care the least about performance.
> >
> >> * The CPU mappings can be faulted on demand (performing the necessary
> >> CPU cache invalidate if needed) and shot-down to allow moving the
> >> memory. In theory when the GPU needs the memory it should map the buffer
> >> and the exporter can then shoot down the mappings, perform the CPU cache
> >> clean and then allow the GPU to use the memory. A subsequent CPU access
> >> would then refault the page, ensuring a CPU cache invalidate so the
> >> latest data is visible.
> >
> > We thought that was the answer, until it was clear its not. dma-buf
> > mmap isn't coherent, you need to call the begin/end ioctls.
> >
> >> * The majority of exporters are simple and deal with uncached memory
> >> (e.g. frame buffers) or are actually exporting back to the same driver
> >> (e.g. window surfaces). In these situations either the driver already
> >> has the necessary "magic" to deal with caches (e.g. kbase provides
> >> explicit cache maintenance operations), or it's "uncached" anyway so it
> >> doesn't matter. This means that hardly anyone tests with the complex
> >> cases...
> >>
> >> From a user space ABI, my understanding is that a dma_buf mmap() mapping
> >> should be coherent, and user space isn't expected to do anything to make
> >> it work. Obviously any importing device might have it's own coherency
> >> details which will be up to the ABI of that device (e.g. Mali has caches
> >> which may need to be flushed - this is usually done at the start/end of
> >> a job chain, so coherency is not guaranteed while the job chain is running).
> >
> > See my other reply, but this isn't correct. dma-buf has explicit cache
> > maintenance ops. It's just that generally everyone (ok, display only
> > drivers using the cma helpers) ends up exporting coherent memory and
> > that's why this works. Doesn't make it a bright idea imo ...
>
> Sorry, I'd completely forgotten about the DMA_BUF_IOCTL_SYNC ioctl when
> I wrote this. But to a large extent this ship has already sailed, and
> indeed the current users of dma_buf_mmap() implicitly assume that no
> sync is necessary (since there's no mechanism to forward the syncs onto
> the exporter). Indeed the comment in dma-buf.c says:
>
> * The assumption in the current dma-buf interfaces is that redirecting the
> * initial mmap is all that's needed. A survey of some of the existing
> * subsystems shows that no driver seems to do any nefarious thing like
> * syncing up with outstanding asynchronous processing on the device or
> * allocating special resources at fault time. So hopefully this is good
> * enough, since adding interfaces to intercept pagefaults and allow pte
> * shootdowns would increase the complexity quite a bit.
>
> I'd be happy to kill off dma_buf_mmap() and require user space to mmap
> via the dma_buf handle (and also therefore to handle the manual sync
> ioctls), but I'm not sure how we achieve that while maintaining
> backwards compatibility.
Uh I forgot to fully update this in my doc update in 2016. That text
originally predates the sync ioctl. Care to type a patch?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the dri-devel
mailing list