Does gbm_bo_map() implicitly synchronise?

Mon Jun 24 19:08:17 UTC 2024

FWIW, the NVIDIA binary driver's implementation of gbm_bo_map/unmap()

1) Don't do any synchronization against in-flight work. The assumption 
is that if the content is going to be read, the API writing the data has 
established that coherence. Likewise, if it's going to be written, the 
API reading it afterwards does any invalidates or whatever are needed 
for coherence.

2) We don't blit anything or format convert, because our GBM 
implementation has no DMA engine access, and I'd like to keep it that 
way. Setting up a DMA-capable driver instance is much more expensive as 
far as runtime resources than setting up a simple allocator+mmap driver, 
at least in our driver architecture. Our GBM map just does an mmap(), 
and if it's not linear, you're not going to be able to interpret the 
data unless you've read up on our tiling formats. I'm aware this is 
different from Mesa, and no one has complained thus far. If we were 
forced to fix it, I imagine we'd do something like ask a shared engine 
in the kernel to do the blit on userspace's behalf, which would probably 
be slow but save resources.

Basically, don't use gbm_bo_map() for anything non-trivial on our 
implementation. It's not the right tool for e.g., reading back or 
populating OpenGL textures or X pixmaps. If you don't want to run on the 
NV implementation, feel free to ignore this advice, but I'd still 
suggest it's not the best tool for most jobs.

Thanks,
-James

On 6/17/24 03:29, Pierre Ossman wrote:
> On 17/06/2024 10:13, Christian König wrote:
>>
>> Let me try to clarify a couple of things:
>>
>> The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so 
>> that the GPU can see values written by the CPU and the CPU can see 
>> values written by the GPU. But that IOCTL does *not* wait for any 
>> async GPU operation to finish.
>>
>> If you want to wait for async GPU operations you either need to call 
>> the OpenGL functions to read pixels or do a select() (or poll, epoll 
>> etc...) call on the DMA-buf file descriptor.
>>
> 
> Thanks for the clarification!
> 
> Just to avoid any uncertainty, are both of these things done implicitly 
> by gbm_bo_map()/gbm_bo_unmap()?
> 
> I did test adding those steps just in case, but unfortunately did not 
> see an improvement. My order was:
> 
> 1. gbm_bo_import(GBM_BO_USE_RENDERING)
> 2. gbm_bo_get_fd()
> 3. Wait for client to request displaying the buffer
> 4. gbm_bo_map(GBM_BO_TRANSFER_READ)
> 5. select(fd+1, &fds, NULL, NULL, NULL)
> 6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | 
> DMA_BUF_SYNC_READ })
> 7. pixman_blt()
> 8. gbm_bo_unmap()
> 
>> So if you want to do some rendering with OpenGL and then see the 
>> result in a buffer memory mapping the correct sequence would be the 
>> following:
>>
>> 1. Issue OpenGL rendering commands.
>> 2. Call glFlush() to make sure the hw actually starts working on the 
>> rendering.
>> 3. Call select() on the DMA-buf file descriptor to wait for the 
>> rendering to complete.
>> 4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.
>>
> 
> What I want to do is implement the X server side of DRI3 in just CPU. It 
> works for every application I've tested except gnome-shell.
> 
> I would assume that 1. and 2. are supposed to be done by the X client, 
> i.e. gnome-shell?
> 
> What I need to be able to do is access the result of that, once the X 
> client tries to draw using that GBM backed pixmap (e.g. using 
> PresentPixmap).
> 
> So far, we've only tested Intel GPUs, but we are setting up Nvidia and 
> AMD GPUs at the moment. It will be interesting to see if the issue 
> remains on those or not.
> 
> Regards