[RFC] Plane color pipeline KMS uAPI

Fri Jun 9 16:30:01 UTC 2023

Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga at quicinc.com> wrote:

> > The new COLOROP objects also expose a number of KMS properties. Each has a
> > type, a reference to the next COLOROP object in the linked list, and other
> > type-specific properties. Here is an example for a 1D LUT operation:
> >
> >      Color operation 42
> >      ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >      ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> curves? Will different hardware be allowed to expose a subset of these
> enum values?

Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.

> >      ├─ "lut_size": immutable range = 4096
> >      ├─ "lut_data": blob
> >      └─ "next": immutable color operation ID = 43
> >
> Some hardware has per channel 1D LUT values, while others use the same
> LUT for all channels.  We will definitely need to expose this in the
> UAPI in some form.

Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.

> > To configure this hardware block, user-space can fill a KMS blob with
> > 4096 u32
> > entries, then set "lut_data" to the blob ID. Other color operation types
> > might
> > have different properties.
> >
> The bit-depth of the LUT is an important piece of information we should
> include by default. Are we assuming that the DRM driver will always
> reduce the input values to the resolution supported by the pipeline?
> This could result in differences between the hardware behavior
> and the shader behavior.
> 
> Additionally, some pipelines are floating point while others are fixed.
> How would user space know if it needs to pack 32 bit integer values vs
> 32 bit float values?

Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.

Exposing the actual hardware precision is something we've talked about during
the hackfest. It'll probably be useful to some extent, but will require some
discussion to figure out how to design the uAPI. Maybe a simple property is
enough, maybe not (e.g. fully describing the precision of segmented LUTs would
probably be trickier).

I'd rather keep things simple for the first pass, we can always add more
properties for bit depth etc later on.

> > Here is another example with a 3D LUT:
> >
> >      Color operation 42
> >      ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >      ├─ "lut_size": immutable range = 33
> >      ├─ "lut_data": blob
> >      └─ "next": immutable color operation ID = 43
> >
> We are going to need to expose the packing order here to avoid any
> programming uncertainty. I don't think we can safely assume all hardware
> is equivalent.

The driver can easily change the layout of the matrix and do any conversion
necessary when programming the hardware. We do need to document what layout is
used in the uAPI for sure.

> > And one last example with a matrix:
> >
> >      Color operation 42
> >      ├─ "type": enum {Bypass, Matrix} = Matrix
> >      ├─ "matrix_data": blob
> >      └─ "next": immutable color operation ID = 43
> >
> It is unclear to me what the default sizing of this matrix is. Any
> objections to exposing these details with an additional property?

The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
that wouldn't be enough?

> Dithering logic exists in some pipelines. I think we need a plan to
> expose that here as well.

Hm, I'm not too familiar with dithering. Do you think it would make sense to
expose as an additional colorop block? Do you think it would have more
consequences on the design?

I want to re-iterate that we don't need to ship all features from day 1. We
just need to come up with a uAPI design on which new features can be built on.

> > [Simon note: an alternative would be to split the color pipeline into
> > two, by
> > having two plane properties ("color_pipeline_pre_scale" and
> > "color_pipeline_post_scale") instead of a single one. This would be
> > similar to
> > the way we want to split pre-blending and post-blending. This could be less
> > expressive for drivers, there may be hardware where there are dependencies
> > between the pre- and post-scaling pipeline?]
> >
> As others have noted, breaking up the pipeline with immutable blocks
> makes the most sense to me here. This way we don't have to predict ahead
> of time every type of block that maybe affected by pipeline ordering.
> Splitting the pipeline into two properties now means future
> logical splits would require introduction of further plane properties.

Right, if there are more "breaking points", then we'll need immutable blocks
anyways.

> > Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> > contains some fixed-function blocks which convert from LMS to ICtCp and
> > cannot
> > be disabled/bypassed. NVIDIA hardware has been designed for descriptive
> > APIs
> > where user-space provides a high-level description of the colorspace
> > conversions it needs to perform, and this is at odds with our KMS uAPI
> > proposal. To address this issue, we suggest adding a special block type
> > which
> > describes a fixed conversion from one colorspace to another and cannot be
> > configured by user-space. Then user-space will need to accomodate its
> > pipeline
> > for these special blocks. Such fixed hardware blocks need to be well enough
> > documented so that they can be implemented via shaders.
> >
> A few questions here. What is the current plan for documenting the
> mathematical model for each exposed block? Will each defined 'type' enum
> value be locked to a definition in the kernel documents? As an example,
> when we say '3D LUT' in this proposal does this mean the block will
> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
> direct in to out LUT mapping?

I think we'll want to document these things, yes. We do want to give _some_
slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
hardware segmented LUTs with a different number of elements per LUT segment.
But being mathematically precise (probably with formulae in the docs) is
definitely a goal, and absolutely necessary to implement a shader-based
fallback.

> Overall I am a fan of this proposal though. The prescriptive color
> pipeline UAPI is simple and easy to follow.

Thank you for the comments! Let me know if you disagree with some of the above,
or if my answers are unclear.