RFC: hardware accelerated bitblt using dma engine

Tue Aug 2 14:04:48 UTC 2016

On Tue, Aug 02, 2016 at 03:21:08PM +0200, Enrico Weigelt, metux IT consult wrote:
> Hi folks,
> 
> 
> I'm currently thinking about adding an hw-accelerated bitblt operation.
> The idea goes like this:
> 
> * we add some bitblt ioctl which copies rects between bo's.
>   (it also handles memory layouts, pixfmt conversion, etc)
> * the driver can decide to let the GPU or IPU do that, if available
> * if we have an suitable DMA engine (maybe only the more complex ones
>   which can handle lines on their own ...) we'll use that
> * as fallback, resort to memcpy().
> 
> 
> Whether an dma engine can/should be used might be highly hw specific,
> so that probably would be configured in DT.
> 
> To use that feature, userland could actually allocate two BO's,
> one that's mapped as a framebuffer to some crtc, another one just
> a memory buffer. It could then render to the fast memory buffer and
> tell the DRM to only copy over the changed regions to the graphics
> memory via DMA (or whatever is best on that particular hw platform).
> 
> 
> What do you think about that idea ?

If you mean "add a generic hw-accelerated bitblt operation": This is not
hw drm works. The generic kms stuff is about display only, with just very
basic (hence "dumb") buffer allocation support in a generic way.

If you mean "expose the dma engine I have here to userspace in
driver-private ioctls with the trade-off logic between that, kms
compositing using the display block and memcpy in userspace", then go
ahead ;-) But if you do that, pls don't don't forget that for any uapi the
drm subsytem requires correspoding open source userspace (in a real
app/compositor, not just some toy test or something similar).

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch