[RFC] Plane color pipeline KMS uAPI

Thu May 4 21:10:46 UTC 2023

On 5/4/23 11:22, Simon Ser wrote:
> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
> 

Thanks for typing this up. It does a great job describing the vision.

> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.
> 
> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, ordering and
> capabilities of hardware blocks is different on each display engine. So, we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> properties and shows the new generic properties when enabled.
> 
> To use this uAPI, first user-space needs to discover hardware capabilities via
> KMS objects and properties, then user-space can configure the hardware via an
> atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> 
> Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> property is an enum, each enum entry represents a color pipeline supported by
> the hardware. The special zero entry indicates that the pipeline is in
> "bypass"/"no-op" mode. For instance, the following plane properties describe a
> primary plane with 2 supported pipelines but currently configured in bypass
> mode:
> 
>      Plane 10
>      ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>      ├─ …
>      └─ "color_pipeline": enum {0, 42, 52} = 0
> 
> The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> objects. The entry value is an object ID pointing to the head of the linked
> list (the first operation in the color pipeline).
> 
> The new COLOROP objects also expose a number of KMS properties. Each has a
> type, a reference to the next COLOROP object in the linked list, and other
> type-specific properties. Here is an example for a 1D LUT operation:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 43
> 
> To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> entries, then set "lut_data" to the blob ID. Other color operation types might
> have different properties.
> 
> Here is another example with a 3D LUT:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>      ├─ "lut_size": immutable range = 33
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 43
> 
> And one last example with a matrix:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 43
> 
> [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> blocks which can be bypassed instead.]

I would favor a "bypass" boolean property.

> 
> [Jonas note: perhaps a single "data" property for both LUTs and matrices
> would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> 

I concur. We'll probably want to document for which types a property 
applies.

> If some hardware supports re-ordering operations in the color pipeline, the
> driver can expose multiple pipelines with different operation ordering, and
> user-space can pick the ordering it prefers by selecting the right pipeline.
> The same scheme can be used to expose hardware blocks supporting multiple
> precision levels.
> 
> That's pretty much all there is to it, but as always the devil is in the
> details.
> 

One such detail that might need some thought is whether the specific 
pipeline configuration exposed by a driver becomes uAPI. In theory I 
might be breaking use-cases userspace has if I change my color pipeline, 
but it would still discoverable and usable if userspace uses the uAPI in 
a truly vendor-neutral way.

Thoughts?

> First, we realized that we need a way to indicate where the scaling operation
> is happening. The contents of the framebuffer attached to the plane might be
> scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
> the colorspace scaling is applied in, the result will be different, so we need
> a way for the kernel to indicate which hardware blocks are pre-scaling, and
> which ones are post-scaling. We introduce a special "scaling" operation type,
> which is part of the pipeline like other operations but serves an informational
> role only (effectively, the operation cannot be configured by user-space, all
> of its properties are immutable). For example:
> 
>      Color operation 43
>      ├─ "type": immutable enum {Scaling} = Scaling
>      └─ "next": immutable color operation ID = 44
> 
> [Simon note: an alternative would be to split the color pipeline into two, by
> having two plane properties ("color_pipeline_pre_scale" and
> "color_pipeline_post_scale") instead of a single one. This would be similar to
> the way we want to split pre-blending and post-blending. This could be less
> expressive for drivers, there may be hardware where there are dependencies
> between the pre- and post-scaling pipeline?]
> 

I would prefer to avoid splitting the pipeline again. We can't easily 
avoid the pre-/post-blending split but for scaling it might be more 
straight-forward to add a read-only scaling op. This isn't a strong 
preference since I could see either way working out well.

> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> contains some fixed-function blocks which convert from LMS to ICtCp and cannot
> be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
> where user-space provides a high-level description of the colorspace
> conversions it needs to perform, and this is at odds with our KMS uAPI
> proposal. To address this issue, we suggest adding a special block type which
> describes a fixed conversion from one colorspace to another and cannot be
> configured by user-space. Then user-space will need to accomodate its pipeline
> for these special blocks. Such fixed hardware blocks need to be well enough
> documented so that they can be implemented via shaders.
> 
> We also noted that it should always be possible for user-space to completely
> disable the color pipeline and switch back to bypass/identity without a
> modeset. Some drivers will need to fail atomic commits for some color
> pipelines, in particular for some specific LUT payloads. For instance, AMD
> doesn't support curves which are too steep, and Intel doesn't support curves
> which decrease. This isn't something which routinely happens, but there might
> be more cases where the hardware needs to reject the pipeline. Thus, when
> user-space has a running KMS color pipeline, then hits a case where the
> pipeline cannot keep running (gets rejected by the driver), user-space needs to
> be able to immediately fall back to shaders without any glitch. This doesn't
> seem to be an issue for AMD, Intel and NVIDIA.
> 
> This uAPI is extensible: we can add more color operations, and we can add more
> properties for each color operation type. For instance, we might want to add
> support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
> the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
> to keep the scope of the proposal manageable.
> 
> Later on, we plan to re-use the same machinery for post-blending color
> pipelines. There are some more details about post-blending which have been
> separately debated at the hackfest, but we believe it's a viable plan. This
> solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
> we'd like to introduce a client cap to hide the old properties and show the new
> post-blending color pipeline properties.
> 
> We envision a future user-space library to translate a high-level descriptive
> color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
> for color pipelines"). The library could also offer a translation into shaders.
> This should help share more infrastructure between compositors and ease KMS
> offloading. This should also help dealing with the NVIDIA case.
> 
> To wrap things up, let's take a real-world example: how would gamescope [2]
> configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> 
> AMD would expose the following objects and properties:
> 
>      Plane 10
>      ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>      └─ "color_pipeline": enum {0, 42} = 0
>      Color operation 42 (input CSC)
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 43
>      Color operation 43
>      ├─ "type": enum {Scaling} = Scaling
>      └─ "next": immutable color operation ID = 44
>      Color operation 44 (DeGamma)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
>      └─ "next": immutable color operation ID = 45
>      Color operation 45 (gamut remap)
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 46
>      Color operation 46 (shaper LUT RAM)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT} = LUT
>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 47
>      Color operation 47 (3D LUT RAM)
>      ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>      ├─ "lut_size": immutable range = 17
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 48
>      Color operation 48 (blend gamma)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 0
> 
> To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> display, gamescope would perform an atomic commit with the following property
> values:
> 
>      Plane 10
>      └─ "color_pipeline" = 42
>      Color operation 42 (input CSC)
>      └─ "matrix_data" = PQ → scRGB (TF)
>      Color operation 44 (DeGamma)
>      └─ "type" = Bypass
>      Color operation 45 (gamut remap)
>      └─ "matrix_data" = scRGB (TF) → PQ
>      Color operation 46 (shaper LUT RAM)
>      └─ "lut_data" = PQ → Display native
>      Color operation 47 (3D LUT RAM)
>      └─ "lut_data" = Gamut mapping + tone mapping + night mode
>      Color operation 48 (blend gamma)
>      └─ "1d_curve_type" = PQ
> 
> I hope comparing these properties to the diagrams linked above can help
> understand how the uAPI would be used and give an idea of its viability.
> 
> Please feel free to provide feedback! It would be especially useful to have
> someone familiar with Arm SoCs look at this, to confirm that this proposal
> would work there.
> 

This is the major gap we have with this proposal, so I hope someone 
working on the Arm SoC drivers sees this and can comment.

Again, thanks for typing this up, Simon.

Harry

> Unless there is a show-stopper, we plan to follow up this RFC with
> implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> 
> Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> Let's work together to make this happen!
> 
> Simon, on behalf of the hackfest participants
> 
> [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> [2]: https://github.com/ValveSoftware/gamescope
> [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg