[RFC v2 02/22] drm: Add Enhanced Gamma and color lut range attributes

Fri Nov 5 11:49:47 UTC 2021

On Thu, Nov 04, 2021 at 12:27:56PM -0400, Harry Wentland wrote:
> 
> 
> On 2021-11-04 04:38, Pekka Paalanen wrote:
> > On Wed, 3 Nov 2021 11:08:13 -0400
> > Harry Wentland <harry.wentland at amd.com> wrote:
> > 
> >> On 2021-09-06 17:38, Uma Shankar wrote:
> >>> Existing LUT precision structure is having only 16 bit
> >>> precision. This is not enough for upcoming enhanced hardwares
> >>> and advance usecases like HDR processing. Hence added a new
> >>> structure with 32 bit precision values.
> >>>
> >>> This also defines a new structure to define color lut ranges,
> >>> along with related macro definitions and enums. This will help
> >>> describe multi segmented lut ranges in the hardware.
> >>>
> >>> Signed-off-by: Uma Shankar <uma.shankar at intel.com>
> >>> ---
> >>>  include/uapi/drm/drm_mode.h | 58 +++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 58 insertions(+)
> >>>
> >>> diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
> >>> index 90c55383f1ee..1079794c86c3 100644
> >>> --- a/include/uapi/drm/drm_mode.h
> >>> +++ b/include/uapi/drm/drm_mode.h
> >>> @@ -903,6 +903,64 @@ struct hdr_output_metadata {
> >>>  	};
> >>>  };
> >>>  
> >>> +/*
> >>> + * DRM_MODE_LUT_GAMMA|DRM_MODE_LUT_DEGAMMA is legal and means the LUT
> >>> + * can be used for either purpose, but not simultaneously. To expose
> >>> + * modes that support gamma and degamma simultaneously the gamma mode
> >>> + * must declare distinct DRM_MODE_LUT_GAMMA and DRM_MODE_LUT_DEGAMMA
> >>> + * ranges.
> >>> + */
> >>> +/* LUT is for gamma (after CTM) */
> >>> +#define DRM_MODE_LUT_GAMMA BIT(0)
> >>> +/* LUT is for degamma (before CTM) */
> >>> +#define DRM_MODE_LUT_DEGAMMA BIT(1)
> >>> +/* linearly interpolate between the points */
> >>> +#define DRM_MODE_LUT_INTERPOLATE BIT(2)
> >>> +/*
> >>> + * the last value of the previous range is the
> >>> + * first value of the current range.
> >>> + */
> >>> +#define DRM_MODE_LUT_REUSE_LAST BIT(3)
> >>> +/* the curve must be non-decreasing */
> >>> +#define DRM_MODE_LUT_NON_DECREASING BIT(4)
> >>> +/* the curve is reflected across origin for negative inputs */
> >>> +#define DRM_MODE_LUT_REFLECT_NEGATIVE BIT(5)
> >>> +/* the same curve (red) is used for blue and green channels as well */
> >>> +#define DRM_MODE_LUT_SINGLE_CHANNEL BIT(6)
> >>> +
> >>> +struct drm_color_lut_range {
> >>> +	/* DRM_MODE_LUT_* */
> >>> +	__u32 flags;
> >>> +	/* number of points on the curve */
> >>> +	__u16 count;
> >>> +	/* input/output bits per component */
> >>> +	__u8 input_bpc, output_bpc;
> >>> +	/* input start/end values */
> >>> +	__s32 start, end;
> >>> +	/* output min/max values */
> >>> +	__s32 min, max;
> >>> +};
> >>> +
> >>> +enum lut_type {
> >>> +	LUT_TYPE_DEGAMMA = 0,
> >>> +	LUT_TYPE_GAMMA = 1,
> >>> +};
> >>> +
> >>> +/*
> >>> + * Creating 64 bit palette entries for better data
> >>> + * precision. This will be required for HDR and
> >>> + * similar color processing usecases.
> >>> + */
> >>> +struct drm_color_lut_ext {
> >>> +	/*
> >>> +	 * Data is U32.32 fixed point format.
> >>> +	 */
> >>> +	__u64 red;
> >>> +	__u64 green;
> >>> +	__u64 blue;
> >>> +	__u64 reserved;
> >>> +};  
> >>
> >> I've been drawing out examples of drm_color_lut_range defined PWLs
> >> and understand a bit better what you and Ville are trying to accomplish
> >> with it. It actually makes a lot of sense and would allow for a generic
> >> way to populate different PWL definitions with a generic function.
> >>
> >> But I still have some key questions that either are not answered in these
> >> patch sets or that I somehow overlooked.
> >>
> >> Can you explain how the U32.32 fixed point format relates to the input_bpc
> >> and output_bpc in drm_color_lut_range, as we as to the pixel coming in from
> >> the framebuffer.
> >>
> >> E.g. if we have ARGB2101010 what happens to a 0x3ff red value (assuming alpha
> >> is non-multiplied)?
> >>
> >> If the drm_color_lut_range segments are defined with input_bpc of 24 bpc will
> >> 0x3ff be zero-expanded to 24-bit? Is the 24 bpc an integer? I.e. would our 3xff
> >> be interpreted as 0x3ff << (24-10)? 
> >>
> >> Assuming the output_bpc is 16 bpc and the programmed LUT makes this 1-to-1 would
> >> the output value be 0x3ff << (16-10)?
> >>
> >> On AMD HW the pipe-internal format is a custom floating point format. We could
> >> probably express that in terms of input/output_bpc and do the translation in our
> >> driver between that and the internal floating point format, depending on the
> >> framebuffer format, but there is the added complication of the magnitude of the
> >> pixel data and correlating HDR with SDR planes.
> >>
> >> E.g. any SDR data would map from 0.0 to 1.0 floating point, while HDR content would
> >> map from 0.0 to some value larger than 1.0. I don't (yet) have a clear picture how
> >> to represent that with the drm_color_lut_range definition.
> > 
> > 
> > Hi Harry,
> > 
> > I think you just would not. Conceptually an SDR plane gets its very own
> > LUT that converts the FB [0.0, 1.0] range to any appropriate [a >= 0.0,
> > b <= 1.0] range in HDR. This is purely conceptual, in the terms of the
> > abstract KMS color pipeline, where [0.0, 1.0] is always the full
> > dynamic range at any point of the pipeline, meaning it is relative to
> > its placement in the pipeline. If you want to use values >1.0 in hw,
> > you can do so under the hood.
> > 
> > At least that is how I would imagine things. With LUTs in general, I
> > don't think I have ever seen LUT input domain being explicitly defined
> > to something else than [0.0, 1.0] relative to the elements in the LUT
> > where 0.0 maps exactly to the first element and 1.0 maps exactly to the
> > last element.
> > 
> > I'm of course open to other suggestions, but having values outside of
> > [0.0, 1.0] range in the abstract pipeline will always raise the
> > question: how do you feed those to the LUT next in the pipeline.
> > 
> 
> AMD HW defines the LUT addressing in floating point space and allows
> for addressing beyond 1.0. In fact on other OSes our driver uses
> [0.0, 1.0] for SDR LUTs and [0.0, 128.0] for HDR LUTs.
> 
> There are color spaces that extend beyond 1.0 and even into the negative
> range: https://en.wikipedia.org/wiki/ScRGB
> 
> I don't think we should define the LUT to be limited to [0.0, 1.0].

That is not the intention, or at least wasn't my intention when
originally I proposed this gamma mode stuff. I specifically wanted
support for extended values. So 0.0-1.0 is supposed to be just the
range for the in gamut values.

> 
> If the framebuffer is not in FP16 the question then becomes how
> the integer pixel values relate to LUT addressing.

The idea again is that 0.0-1.0 is the range for the in gamut
values. IIRC our hw does have the possibility of scaling all
the fp16 input values by some programmable constant factor,
but exposing that would require yet another uapi addition.

> 
> As well, LUT entries are defined to be U32.32 fixed point, also
> allowing for entries much greater than 1.0. If those are programmed
> as entries in the input (degamma) LUT how will they be used to address
> entries in the output (gamma) LUT?

The u32.32 is a mistake I think. IMO we should be going for signed
values everywhere. Though our hw does not actually let us directly
program negative values for the LUT, rather the hw just reflects
the whole curve across the origin so the lookup is basically just
something like:
output = input < 0 ? -lut[abs(input)] : lut[input];

That is why there is that proposed DRM_MODE_LUT_REFLECT_NEGATIVE flag.

I think nouveau had something funny in its lut programming that
made me think that it might actually have straight up programmable
LUT entries for negative inputs as well. But I'm not sure if anyone
has actually tested that.

> 
> Maybe we want to say the actual input values are being used to
> address the LUT entries? But if we look at segment 1 of Uma's
> d13_degamma_hdr definition we see that the range of 0 to
> (1 << 24) -1 is covered by 128 (1 << 7) entries, so the range
> of 0 to 256 (for RGB with 8 bpc) would only be covered by 2
> LUT entries. So this interpretation doesn't make sense.
> 
> You can see, I'm still confused by these definitions. An IGT
> test would help explain the API intentions a bit better, though
> I would like to see more precise documentation.
> 
> Despite my confusion I think the segmented PWL definitions are
> a neat way to concisely describe the HW capabilities to
> userspace and a concise way for userspace to provide the LUT
> more precisely than with a uniformly spaced LUT.
> 
> Harry
> 
> > Yeah, I have no idea what it should mean if an FB has a format that
> > allows values beyond [0.0, 1.0].
> > 
> > 
> > Thanks,
> > pq
> > 
> > 
> >> If some of these questions should be obvious I apologize for being a bit dense,
> >> though it helps to make this accessible to the lowest common denominator
> >> to ensure not only the smartest devs can work with this.
> >>
> >> Harry
> >>
> >>> +
> >>>  #define DRM_MODE_PAGE_FLIP_EVENT 0x01
> >>>  #define DRM_MODE_PAGE_FLIP_ASYNC 0x02
> >>>  #define DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE 0x4
> >>>   
> >>
> > 

-- 
Ville Syrjälä
Intel