[RFC 0/4] Exynos DRM: add Picture Processor extension

Wed Apr 26 14:53:27 UTC 2017

Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
> Hi Marek,
> 
> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
> > Hi Laurent,
> > 
> > On 2017-04-20 12:25, Laurent Pinchart wrote:
> > > Hi Marek,
> > > 
> > > (CC'ing Sakari Ailus)
> > > 
> > > Thank you for the patches.
> > > 
> > > On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
> > > > Dear all,
> > > > 
> > > > This is an updated proposal for extending EXYNOS DRM API with generic
> > > > support for hardware modules, which can be used for processing image data
> > > > from the one memory buffer to another. Typical memory-to-memory operations
> > > > are: rotation, scaling, colour space conversion or mix of them. This is a
> > > > follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
> > > > processors", which has been rejected as "not really needed in the DRM
> > > > core":
> > > > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> > > > 
> > > > In this proposal I moved all the code to Exynos DRM driver, so now this
> > > > will be specific only to Exynos DRM. I've also changed the name from
> > > > framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
> > > > with fbdev API.
> > > > 
> > > > Here is a bit more information what picture processors are:
> > > > 
> > > > Embedded SoCs are known to have a number of hardware blocks, which perform
> > > > such operations. They can be used in paralel to the main GPU module to
> > > > offload CPU from processing grapics or video data. One of example use of
> > > > such modules is implementing video overlay, which usually requires color
> > > > space conversion from NV12 (or similar) to RGB32 color space and scaling to
> > > > target window size.
> > > > 
> > > > The proposed API is heavily inspired by atomic KMS approach - it is also
> > > > based on DRM objects and their properties. A new DRM object is introduced:
> > > > picture processor (called pp for convenience). Such objects have a set of
> > > > standard DRM properties, which describes the operation to be performed by
> > > > respective hardware module. In typical case those properties are a source
> > > > fb id and rectangle (x, y, width, height) and destination fb id and
> > > > rectangle. Optionally a rotation property can be also specified if
> > > > supported by the given hardware. To perform an operation on image data,
> > > > userspace provides a set of properties and their values for given fbproc
> > > > object in a similar way as object and properties are provided for
> > > > performing atomic page flip / mode setting.
> > > > 
> > > > The proposed API consists of the 3 new ioctls:
> > > > - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
> > > >   processors,
> > > > - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
> > > >   processor,
> > > > - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
> > > >   property set.
> > > > 
> > > > The proposed API is extensible. Drivers can attach their own, custom
> > > > properties to add support for more advanced picture processing (for example
> > > > blending).
> > > > 
> > > > This proposal aims to replace Exynos DRM IPP (Image Post Processing)
> > > > subsystem. IPP API is over-engineered in general, but not really extensible
> > > > on the other side. It is also buggy, with significant design flaws - the
> > > > biggest issue is the fact that the API covers memory-2-memory picture
> > > > operations together with CRTC writeback and duplicating features, which
> > > > belongs to video plane. Comparing with IPP subsystem, the PP framework is
> > > > smaller (1807 vs 778 lines) and allows driver simplification (Exynos
> > > > rotator driver smaller by over 200 lines).

Just a side note, we have written code in GStreamer using the Exnynos 4
FIMC IPP driver. I don't know how many, if any, deployment still exist
(Exynos 4 is relatively old now), but there exist userspace for the
FIMC driver. We use this for color transformation (from tiled to
linear) and scaling. The FIMC driver is in fact quite stable in
upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
largely based on it and has received some maintenance to properly work
in GStreamer. unlike this DRM API, you can reuse the same userspace
code across multiple platforms (which we do already). We have also
integrated this driver in Chromium in the past (not upstream though).

I am well aware that the blitter driver has not got much attention
though. But again, V4L2 offers a generic interface to userspace
application. Fixing this driver could enable some work like this one:

https://bugzilla.gnome.org/show_bug.cgi?id=772766

This work in progress feature is a generic hardware accelerated video
mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
believe is in staging right now). Again, unlike the exynos/drm, this
code could be reused between platforms.

In general, the problem with the DRM approach is that it only targets
displays. We often need to use these IP block for stream pre/post
processing outside a "playback" use case.

What I'd like so see instead here, is an approach that helps both world
 instead of trying to win the control over the IP block. Renesas
development seems to lead toward the right direction by creating
drivers that can be both interfaced in DRM and V4L2. For IPP and
GScaler on Exynos, this would be a greater benefit and finally the code
could be shared, having a single place to fix when we find bugs.

> > > 
> > > This seems to be the kind of hardware that is typically supported by V4L2.
> > > Stupid question, why DRM ?
> > 
> > Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
> > 
> > 1. we want to replace existing Exynos IPP subsystem:
> >  - it is used only in some internal/vendor trees, not in open-source
> >  - we want it to have sane and potentially extensible userspace API
> >  - but we don't want to loose its functionality
> > 
> > 2. we want to have simple API for performing single image processing
> > operation:
> >  - typically it will be used by compositing window manager, this means that
> >    some parameters of the processing might change on each vblank (like
> >    destination rectangle for example). This api allows such change on each
> >    operation without any additional cost. V4L2 requires to reinitialize
> >    queues with new configuration on such change, what means that a bunch of
> >    ioctls has to be called.
> 
> What do you mean by re-initialising the queue? Format, buffers or something
> else?
> 
> If you need a larger buffer than what you have already allocated, you'll
> need to re-allocate, V4L2 or not.
> 
> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
> implementing that and some work in videobuf2.
> 
> Another thing is that V4L2 is very stream oriented. For most devices that's
> fine as a lot of the parameters are not changeable during streaming,
> especially if the pipeline is handled by multiple drivers. That said, for
> devices that process data from memory to memory performing changes in the
> media bus formats and pipeline configuration is not very efficient
> currently, largely for the same reason.
> 
> The request API that people have been working for a bit different use cases
> isn't in mainline yet. It would allow more efficient per-request
> configuration than what is currently possible, but it has turned out to be
> far from trivial to implement.
> 
> >  - validating processing parameters in V4l2 API is really complicated,
> >    because the parameters (format, src&dest rectangles, rotation) are being
> >    set incrementally, so we have to either allow some impossible,
> > transitional
> >    configurations or complicate the configuration steps even more (like
> >    calling some ioctls multiple times for both input and output). In the end
> >    all parameters have to be again validated just before performing the
> >    operation.
> 
> You have to validate the parameters in any case. In a MC pipeline this takes
> place when the stream is started.
> 
> > 
> > 3. generic approach (to add it to DRM core) has been rejected:
> > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> 
> For GPUs I generally understand the reasoning: there's a very limited number
> of users of this API --- primarily because it's not an application
> interface.
> 
> If you have a device that however falls under the scope of V4L2 (at least
> API-wise), does this continue to be the case? Will there be only one or two
> (or so) users for this API? Is it the case here?
> 
> Using a device specific interface definitely has some benefits: there's no
> need to think how would you generalise the interface for other similar
> devices. There's no need to consider backwards compatibility as it's not a
> requirement. The drawback is that the applications that need to support
> similar devices will bear the burden of having to support different APIs.
> 
> I don't mean to say that you should ram whatever under V4L2 / MC
> independently of how unworkable that might be, but there are also clear
> advantages in using a standardised interface such as V4L2.
> 
> V4L2 has a long history behind it and if it was designed today, I bet it
> would look quite different from what it is now.
> 
> > 
> > 4. this api can be considered as extended 'blit' operation, other DRM
> > drivers
> >    (MGA, R128, VIA) already have ioctls for such operation, so there is also
> >    place in DRM for it

Note that I am convince that using these custom IOCTL within a
"compositor" implementation is much easier and uniform compared to
using a v4l2 driver. It probably offers lower latency. But these are
non-generic and are not a great fit for streaming purpose. Request API
and probably explicit fence may mitigate this though. Meanwhile, there
is some indication that even though complex, there is already some
people that do think implementing a compositor combining V4L2 and DRM
is feasible.

http://events.linuxfoundation.org/sites/events/files/slides/als2015_way
land_weston_v2.pdf

> 
> Added LMML to cc.

Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20170426/785be9fc/attachment-0001.sig>