RFC: hardware accelerated bitblt using dma engine

Wed Aug 3 11:47:11 UTC 2016

On Wed, Aug 03, 2016 at 11:24:37AM +0200, Marek Szyprowski wrote:
> Hi Enrico,
> 
> 
> On 2016-08-02 15:21, Enrico Weigelt, metux IT consult wrote:
> > I'm currently thinking about adding an hw-accelerated bitblt operation.
> > The idea goes like this:
> > 
> > * we add some bitblt ioctl which copies rects between bo's.
> >    (it also handles memory layouts, pixfmt conversion, etc)
> > * the driver can decide to let the GPU or IPU do that, if available
> > * if we have an suitable DMA engine (maybe only the more complex ones
> >    which can handle lines on their own ...) we'll use that
> > * as fallback, resort to memcpy().
> > 
> > 
> > Whether an dma engine can/should be used might be highly hw specific,
> > so that probably would be configured in DT.
> > 
> > To use that feature, userland could actually allocate two BO's,
> > one that's mapped as a framebuffer to some crtc, another one just
> > a memory buffer. It could then render to the fast memory buffer and
> > tell the DRM to only copy over the changed regions to the graphics
> > memory via DMA (or whatever is best on that particular hw platform).
> > 
> > 
> > What do you think about that idea ?
> 
> I'm working now on something similar, but more generic. There is already
> a framework for picture processing (converting, scaling, blitting, rotating)
> in Exynos DRM. It is called IPP (Image Post Processing), but its user
> interface is really ugly and limited, so I plan to rewrite it and make
> it really generic. Some discussion on it were already in the following
> thread:
> http://thread.gmane.org/gmane.linux.kernel.samsung-soc/49743
> 
> I plan to propose an API based on DRM object/properties, which will be
> similar to KMS atomic API. I will let you know when I have it ready for
> presenting in public.

In case it's not clear from Dave's, Rob's and my reply: Generic rendering
of any kind is _very_ unpopular in the drm subsystem. We've tried
semi-generic 15 years ago (with some of the shared drm core stuff between
linux and bsd) and it's a disaster of fake generic, single-use code.

The reason for that is that hw accel is actually not simple. You
essentially need to have as little additional abstraction between what's
your real client api (hw composer, Xrender or whatever it is) and the hw.
Because for optimal performance you _must_ supply the commands to the
kernel in an as close to the format/layout used by the hardware as
possible. That means no shared command submission of any kind. And the
other reason is that cache transfers and memory transfers are highly
hardware specific, too. Which means no shared buffer management and
mapping interfaces either.

In short, if you want to get this in you need to disprove the last 15-20
years of linux gfx driver developement and show that we've been wrong on
these. Expect _very_ high resistence to anything remotely looking like a
shared/common blitter uapi. Of course having some common helper code to
make drivers easier to type (like cma helpers, or ttm, or similar) is
something entirely different, this is about the uapi.

And please don't be discourage here, I just want to set clear expectations
to avoid disappointment. Supporting blitter hardware is obviously a good
idea, and I think the drm subsystem is the right place for that
(especially if you have a display block or sometimes a real gpu connected
to that blitter).

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch