RFC: hardware accelerated bitblt using dma engine

Thu Aug 4 07:50:03 UTC 2016

On Thu, Aug 04, 2016 at 01:32:57AM +0200, Enrico Weigelt, metux IT consult wrote:
> On 03.08.2016 13:47, Daniel Vetter wrote:
> 
> > Because for optimal performance you _must_ supply the commands to the
> > kernel in an as close to the format/layout used by the hardware as
> > possible. That means no shared command submission of any kind. And the
> > other reason is that cache transfers and memory transfers are highly
> > hardware specific, too. Which means no shared buffer management and
> > mapping interfaces either.
> 
> Right, but I wonder whether that applies to my case.
> Again, I'm talking about using aux IPs (not the actual GPU) for things
> like copying image regions, maybe even pixfmt/colospace conversions -
> those things, in embedded world, usually aren't done by the gpu, but
> separate IPs.

15+ years ago gpus weren't much more than fancy blitters either ;-)

> > Of course having some common helper code to make drivers easier to type
> > (like cma helpers, or ttm, or similar) is something entirely
> > different, this is about the uapi.
> 
> Well, I'm actually talking about an uapi, as userland somehow needs to
> call it :p
> 
> Doing it in specific drivers doesn't seem to be a good ways, as sooner
> or later we'd have to implement that into lots of different drivers
> (plus corresponding userland support), as it's pretty orthogonal to
> GPU, as well as fbs/crtcs. Just in some cases, it **might** also be done
> via GPU, if applicable (maybe only when its idle anyways), but that's
> not the usual case. Instead the usual case would be employing some DMA
> controller or IPU.

One problem with 2d blitters is that there's no common userspace
interface, but many: Xrender, hwc, old X drawing api, various attempts by
khronos to standardize something, cairo, ... It's probably worse than
video decoding even, and definitely not like on the 3d side where there's
GL (and now vulkan) and that's it.

So you you'll end up with tons of glue code everywhere anyway. Adding yet
another kernel uapi doesn't help, but forcing it to be generic will make
sure it's inefficient. Which means someone else then will create another
one.

> > And please don't be discourage here, I just want to set clear expectations
> > to avoid disappointment. Supporting blitter hardware is obviously a good
> > idea, and I think the drm subsystem is the right place for that
> > (especially if you have a display block or sometimes a real gpu connected
> > to that blitter).
> 
> Okay, where else should we put it ? Invent an entirely new device for
> that ?

If the blitter is always attached to the display block just add a few gem
based ioctls there (like with desktop gpus) for submitting blit workloads.
Otherwise new driver I guess.

Either case it'll probably be a bit more painful than a kms driver, since
on the gem side the helpers aren't that full-featured (yet).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch