[RFC PATCH v2 00/18] Add DRM CRTC 3D LUT interface

Tue Feb 14 09:19:47 UTC 2023

On Mon, 13 Feb 2023 18:26:55 -0100
Melissa Wen <mwen at igalia.com> wrote:

> On 02/10, Pekka Paalanen wrote:
> > On Thu, 9 Feb 2023 13:27:02 -0100
> > Melissa Wen <mwen at igalia.com> wrote:
> >   
> > > On 01/31, Pekka Paalanen wrote:  
> > > > On Mon, 9 Jan 2023 14:38:09 -0100
> > > > Melissa Wen <mwen at igalia.com> wrote:
> > > >     
> > > > > On 01/09, Melissa Wen wrote:    
> > > > > > Hi,
> > > > > > 
> > > > > > After collecting comments in different places, here is a second version
> > > > > > of the work on adding DRM CRTC 3D LUT support to the current DRM color
> > > > > > mgmt interface. In comparison to previous proposals [1][2][3], here we
> > > > > > add 3D LUT before gamma 1D LUT, but also a shaper 1D LUT before 3D LUT,
> > > > > > that means the following DRM CRTC color correction pipeline:
> > > > > > 
> > > > > > Blend -> Degamma 1D LUT -> CTM -> Shaper 1D LUT -> 3D LUT -> Gamma 1D LUT    
> > > > 
> > > > Hi Melissa,
> > > > 
> > > > that makes sense to me, for CRTCs. It would be really good to have that
> > > > as a diagram in the KMS UAPI documentation.
> > > >     
> > > 
> > > Hi Pekka,
> > > 
> > > Thanks for your feedbacks and your time reviewing this proposal.  
> > 
> > No problem, and sorry it took so long!
> > 
> > I'm just finishing the catch-up with everything that happened during
> > winter holidays.
> >   
> > > > If someone wants to add a 3D LUT to KMS planes as well, then I'm not
> > > > sure if it should be this order or swapped. I will probably have an
> > > > opinion about that once Weston is fully HDR capable and has been tried
> > > > in the wild for a while with the HDR color operations fine-tuned based
> > > > on community feedback. IOW, not for a long time. The YUV to RGB
> > > > conversion factors in there as well.
> > > >     
> > > I see, this is also the reason I reuse here Alex Hung's proposal for
> > > pre-blending API. I'll work on better documentation.
> > >   
> > > >     
> > > > > > 
> > > > > > and we also add a DRM CRTC LUT3D_MODE property, based on Alex Hung
> > > > > > proposal for pre-blending 3D LUT [4] (Thanks!), instead of just a
> > > > > > LUT3D_SIZE, that allows userspace to use different supported settings of
> > > > > > 3D LUT, fitting VA-API and new color API better. In this sense, I
> > > > > > adjusted the pre-blending proposal for post-blending usage.
> > > > > > 
> > > > > > Patches 1-6 targets the addition of shaper LUT and 3D LUT properties to
> > > > > > the current DRM CRTC color mgmt pipeline. Patch 6 can be considered an
> > > > > > extra/optional patch to define a default value for LUT3D_MODE, inspired
> > > > > > by what we do for the plane blend mode property (pre-multiplied).
> > > > > > 
> > > > > > Patches 7-18 targets AMD display code to enable shaper and 3D LUT usage
> > > > > > on DCN 301 (our HW case). Patches 7-9 performs code cleanups on current
> > > > > > AMD DM colors code, patch 10 updates AMD stream in case of user 3D LUT
> > > > > > changes, patch 11/12 rework AMD MPC 3D LUT resource handling by context
> > > > > > for DCN 301 (easily extendible to other DCN families). Finally, from
> > > > > > 13-18, we wire up SHAPER LUT, LUT3D and LUT3D MODE to AMD display
> > > > > > driver, exposing modes supported by HW and programming user shaper and
> > > > > > 3D LUT accordingly.
> > > > > > 
> > > > > > Our target userspace is Gamescope/SteamOS.
> > > > > > 
> > > > > > Basic IGT tests were based on [5][6] and are available here (in-progress):
> > > > > > https://gitlab.freedesktop.org/mwen/igt-gpu-tools/-/commits/crtc-lut3d-api
> > > > > > 
> > > > > > [1] https://lore.kernel.org/all/20201221015730.28333-1-laurent.pinchart+renesas@ideasonboard.com/
> > > > > > [2] https://github.com/vsyrjala/linux/commit/4d28e8ddf2a076f30f9e5bdc17cbb4656fe23e69
> > > > > > [3] https://lore.kernel.org/amd-gfx/20220619223104.667413-1-mwen@igalia.com/
> > > > > > [4] https://lore.kernel.org/dri-devel/20221004211451.1475215-1-alex.hung@amd.com/
> > > > > > [5] https://patchwork.freedesktop.org/series/90165/
> > > > > > [6] https://patchwork.freedesktop.org/series/109402/
> > > > > > [VA_API] http://intel.github.io/libva/structVAProcFilterParameterBuffer3DLUT.html
> > > > > > [KMS_pipe_API] https://gitlab.freedesktop.org/pq/color-and-hdr/-/issues/11
> > > > > > 
> > > > > > Let me know your thoughts.      
> > > > > 
> > > > > +Simon Ser, +Pekka Paalanen who might also be interested in this series.    
> > > > 
> > > > Unfortunately I don't have the patch emails to reply to, so here's a
> > > > messy bunch of comments. I'll concentrate on the UAPI design as always.    
> > > 
> > > Sorry, the patchset is here: https://lore.kernel.org/dri-devel/20230109143846.1966301-1-mwen@igalia.com/
> > > In the next version, I won't forget cc'ing you at first.  
> > > > 
> > > > +/*
> > > > + * struct drm_mode_lut3d_mode - 3D LUT mode information.
> > > > + * @lut_size: number of valid points on every dimension of 3D LUT.
> > > > + * @lut_stride: number of points on every dimension of 3D LUT.
> > > > + * @bit_depth: number of bits of RGB. If color_mode defines entries with higher
> > > > + *             bit_depth the least significant bits will be truncated.
> > > > + * @color_format: fourcc values, ex. DRM_FORMAT_XRGB16161616 or DRM_FORMAT_XBGR16161616.
> > > > + * @flags: flags for hardware-sepcific features
> > > > + */
> > > > +struct drm_mode_lut3d_mode {
> > > > +	__u16 lut_size;
> > > > +	__u16 lut_stride[3];
> > > > +	__u16 bit_depth;
> > > > +	__u32 color_format;
> > > > +	__u32 flags;
> > > > +};
> > > > 
> > > > Why is lut_stride an array of 3, but lut_size is not?    
> > > 
> > > It cames from VA-API:
> > > https://intel.github.io/libva/structVAProcFilterParameterBuffer3DLUT.html#a682756be15d09327ba725b74a863cbcc
> > > 
> > > In short, the reason is that lut_size is the valid points and is
> > > the same for every dimensions, but lut_stride may vary.  
> > 
> > Ok, so lut_size is what I would have guessed it to be. Just needs to be
> > doc'd - pretty much all my questions are lack of docs.
> > 
> > I see that lut_stride is supposed to be in points, and not bytes. That
> > is surprising. Are you sure it's good?
> >   
> > > > 
> > > > What is the color_mode the comment is referring to?    
> > > 
> > > It refers to FB color_mode/bpp. I'm not using it in post-blending 3D LUT
> > > implementation (should I?), it cames from pre-blending use case.  Maybe
> > > the main issue here is if reusing the pre-blending 3D LUT mode struct is
> > > a good approach or better create a specific for post-blending.  
> > 
> > Sorry, I have no idea what FB color_mode is. I do not recall any such
> > thing in the KMS uAPI.
> > 
> >   
> > > > 
> > > > What is "number of bits of RGB"? Input precision? Output precision?
> > > > Integer or floating point?    
> > > 
> > > It's the bit depth of the 3D LUT values, the same for every channels. In
> > > the AMD case, it's supports 10-bit and 12-bit, for example.  
> > 
> > Ok. So e.g. r5g6b5 is not a possible 3D LUT element type on any
> > hardware ever?
> > 
> > What exactly is the truncation the comment refers to?
> > 
> > It sounds like if input has higher precision than the LUT elements,
> > then "truncation" occurs. I can kind of see that, but I also think it
> > is a false characterisation. The LUT input precision affects the
> > precision of LUT indexing and the precision of interpolation between
> > the LUT elements. I would not expect those two precisions to be
> > truncated to the LUT element precision (but they could be truncated to
> > something else hardware specific). Instead, I do expect the
> > interpolation result to be truncated to the LUT output precision, which
> > probably is the same as the LUT element precision, but not necessarily.
> > 
> > Maybe the comment about truncation should simply be removed? The result
> > is obvious if we know the LUT input, element, and output precision, and
> > what exactly happens with the indexing and interpolation is probably
> > good enough to be left hardware-specific if it is difficult to describe
> > in generic terms across different hardware.
> >   
> > > > 
> > > > Flags cannot be hardware specific, because it makes the whole KMS UAPI
> > > > hardware specific. That won't work. You have to have driver-agnostic
> > > > definitions for all possible flags.
> > > > 
> > > > Why is this the whole first patch? There is no documentation for the
> > > > UAPI on how this struct works, so I cannot review this. Explaining just
> > > > the individual fields is not enough to understand it. Is this something
> > > > the kernel fills in and is read-only to userspace? Is userspace filling
> > > > this in?    
> > > 
> > > I see. I'll work on explaining/documenting it better.  
> > > > 
> > > > 
> > > > + * “LUT3D”:
> > > > + *	Blob property to set the 3D LUT mapping pixel data after the color
> > > > + *	transformation matrix and before gamma 1D lut correction. The
> > > > + *	data is interpreted as an array of &struct drm_color_lut elements.
> > > > + *	Hardware might choose not to use the full precision of the LUT
> > > > + *	elements.
> > > > + *
> > > > + *	Setting this to NULL (blob property value set to 0) means a the output
> > > > + *	color is identical to the input color. This is generally the driver
> > > > + *	boot-up state too. Drivers can access this blob through
> > > > + *	&drm_crtc_state.gamma_lut.
> > > > + *
> > > > 
> > > > You need to define how the 1-D array of drm_color_lut elements blob
> > > > will be interpreted as a 3-D array for the 3D LUT, and how the
> > > > dimensions match to the R, G and B channels. It's a bit like the
> > > > question about row-major or column-major storage for matrices, except
> > > > more complicated and not in those words.    
> > > 
> > > ack  
> > > > 
> > > > + * “LUT3D_MODE”:
> > > > + *	Enum property to give the mode of the 3D lookup table to be set on the
> > > > + *	LUT3D property. A mode specifies size, stride, bit depth and color
> > > > + *	format and depends on the underlying hardware). If drivers support
> > > > + *	multiple 3D LUT modes, they should be declared in a array of
> > > > + *	drm_color_lut3d_mode and they will be advertised as an enum.
> > > > 
> > > > How does that work exactly? I didn't get it. I could guess, but having
> > > > to guess on API is bad.    
> > > 
> > > The driver advertises all supported modes (each combination of values)
> > > in a array as a enum, userspace can check all accepted modes and set the
> > > one that fits the user 3D LUT settings. I think it's possible to get the
> > > idea from this IGT test:
> > > https://gitlab.freedesktop.org/mwen/igt-gpu-tools/-/commit/8771f444c3dcd126d7590d5a9b1b0db9706bbf6e#ed5dbc960ac210e3fbacd2361fe0270709767aaa_205_205  
> > > >   
> > 
> > You lost me at "an array as an enum".
> > 
> > I understand there is a blob containing an array of struct
> > drm_mode_lut3d_mode. What I don't understand is that you say LUT3D_MODE
> > is an enum property. Where does the blob come from, then? What property
> > provides the blob?
> > 
> > Am I correct in guessing that the values of LUT3D_MODE enum property
> > are indices into the array in the blob, and that userspace will set it?
> > That sounds good to me, if it's the integer value of the enum. But enum
> > values also need string names, because that is how the values are
> > usually recognized, so what name strings will be used?  
> 
> So, in this proposal, LUT3D_MODE is a list of indexes for a blob that describe a supported
> 3D LUT mode

Hi Melissa,

are you sure? I believe you are looking at and explaining some kernel
internal APIs, and not the uAPI which is the important part. Internal
APIs can always be changed later, uAPI cannot.

After I had sent that email, I might have understood how it was
supposed to work instead: there is no array of struct
drm_mode_lut3d_mode.

Instead, LUT3D_MODE is a KMS property of type enum. Each integer value
of the enum is also a blob id. Each blob named by those ids is a single
struct drm_mode_lut3d_mode that userspace needs to retrieve individually
to understand what that specific enum value means.

> i.e. a `struct drm_mode_lut3d_mode` with size, stride, bit depth, etc. Strings here follow this pattern
> `lut3d_{size}_{bit_depth}bit` [1]. When enabling 3D LUT support, the
> driver should pass an array of `struct drm_mode_lut3d_mode` as supported
> modes, with at least one element.

To be clear, I do not care about kernel internal interfaces at all. I
only care about the uAPI. Therefore talking about kernel internal API
will only confuse me and every other userspace developer. Evidently it
has even confused some IGT developers, as I point out some IGT code
problems below.

If the struct drm_mode_lut3d_mode is the authoritative definition of
what each enum value means, then I think the string names should not
attempt to convey any meaning nor information. The strings should be
just "blob id 77" etc. That makes it clear what the value is.

Otherwise userspace will be tempted to use the string names only (as is
the usual uAPI design) to find the right enum value, and not look into
the struct drm_mode_lut3d_mode at all. Therefore, if you don't encode
absolutely everything of struct drm_mode_lut3d_mode into the string
name, you should encode nothing in the name. You might also have two
different descriptions ending up with the same string name under the
same enum property instance, and that must not happen.

> I think this drm_info output helps to understand the idea. Currently, we
> have 5 DRM CRTC color mgmt properties. The drm_info output for them:
> 
> │           ├───"DEGAMMA_LUT": blob = 0
> │           ├───"DEGAMMA_LUT_SIZE" (immutable): range [0, UINT32_MAX] = 4096
> │           ├───"CTM": blob = 0
> │           ├───"GAMMA_LUT": blob = 0
> │           ├───"GAMMA_LUT_SIZE" (immutable): range [0, UINT32_MAX] = 4096
> 
> The idea is to add 4 new properties.
> 
> - The 3D LUT blob (LUT3D) and the indexes of supported modes
>   (LUT3D_MODE), as follows (example with AMD supported modes):
> 
> │           ├───"LUT3D": blob = 0
> │           ├───"LUT3D_MODE": enum {lut3d_17_12bit, lut3d_17_10bit, lut3d_9_12bit, lut3d_9_10bit} = lut3d_17_12bit

Ok, but this does not show the difference between an index into an
array of struct drm_mode_lut3d_mode, and each enum value being a blob
id containing a single struct drm_mode_lut3d_mode.

> 
> - And shaper (1D) LUT - following what we already have for degamma/gamma LUT:
> 
> │           ├───"SHAPER_LUT": blob = 0
> │           └───"SHAPER_LUT_SIZE" (immutable): range [0, UINT32_MAX] = 4096
> 
> 
> Each LUT3D_MODE element works as a index to get the blob for the
> corresponding `struct drm_mode_lut3d_mode` ([2] may works as an
> example). I've also included a patch to set the first element of the
> list of supported mode as a predefined value [3], but maybe it's not
> necessary (or even error prone).

Are you sure about that terminology in the uAPI?

A blob id is not an index. If you do have an index, then which KMS
property is going to tell userspace what is the blob id that contains
the whole array? I haven't seen any KMS property for that yet.

> [1] https://lore.kernel.org/dri-devel/20230109143846.1966301-5-mwen@igalia.com/

Actually, in that patch, I see a loop that creates a new blob from each
element in a kernel-private array of struct drm_mode_lut3d_mode.

> [2] https://gitlab.freedesktop.org/mwen/igt-gpu-tools/-/commit/a7b11bab9d573e38e26982b5cc5cf62e9dca6a75#ed5dbc960ac210e3fbacd2361fe0270709767aaa_886_886

That is userspace code, which does:

	mode_3dlut = (struct drm_mode_3dlut_mode *) blob->data;
	count = blob->length / sizeof(mode_3dlut[0]);
	igt_assert(count);

	for (i = 0; i < count; i++) {

That code is semantically wrong. The blob data is never an array. It is
always a single struct drm_mode_lut3d_mode. This code is misleading the
reader really hard.

> [3] https://lore.kernel.org/dri-devel/20230109143846.1966301-7-mwen@igalia.com/

Well, it's an enum property. A property always has some value. An enum
must always enumerate all possible values. Therefore, you cannot ever
avoid setting some value as the default.

If you set zero as a value, then zero must also be enumerated.

I mentioned before that the drm_mode_lut3d_mode values should be named
"blob id %u". The zero value is not a blob id, it is a true constant
instead, so its name can be e.g. "pass-through" if that is what it
means. Or maybe that doesn't make sense, because setting the LUT3D
property to zero means "pass-through", in which case zero is not a
valid value for LUT3D_MODE ever, not even in the kernel (but sounds
like the DRM core does not guard against the driver setting enum values
that do not exist).

Note, that value zero literally means "no data". It is not the index to
the first struct drm_mode_lut3d_mode element. So the userspace code of
[2] is actually broken with respect to this: it will attempt to fetch a
blob id 0, and crash an assert.

If it does not crash, then I think the enum description is lying to
userspace: zero is a valid value that is not listed as existing.

Or is there some uAPI specification somewhere saying that any enum
property can be set to zero as well, regardless of what values are
listed as possible? I would hope not.

> > 
> > In that code example, I cannot see any connection between the array of
> > drm_mode_lut3d_mode entries, the 'name' string, and the actual data
> > generated for the LUT. They must all connect somehow, but it just
> > doesn't seem to be in the code. It is just hardcoding mode_lut3d[0],
> > and the data generator never sees even that - it gets the lut size, but
> > none of the other parameters. I cannot see that working across multiple
> > drivers.
> > 
> > Why is it setting a linear shaper LUT explicitly instead of unsetting
> > it? I mean this preparation:
> > 
> > shaper_linear = generate_table(data->shaper_lut_size, 1.0);
> > 
> > "Linear" actually means "identity", right? That's what people usually
> > write when they mean identity.
> > 
> > Surely it's not actually a constant curve? That would make the whole
> > test useless for proving the 3D LUT works.  
> 
> So, the idea of the test is only set a 3D LUT, without any shaper LUT
> and see if the 3D LUT maps to max color values. However, from my tests
> on AMD HW with 3D LUT support, I was not able to program a 3D LUT
> without a shaper LUT, that means a shaper LUT in BYPASS mode. By
> comparing CRC results I could see that if I don't set any shaper LUT,
> the driver sets shaper LUT in BYPASS mode but the 3D LUT is just
> ignored, so the combo shaper + 3D LUT seems a HW requirement. To
> overcome this requirement and keep the expected behavior, the test sets
> a linear shaper LUT to acts as a BYPASS LUT to overcome this
> requirement. Am I missing something?

Sorry, no, I just had no idea what generate_table(..., 1.0) does. I
confused it with generate_table_max() which I have seen used before.

However, this exposes a kernel bug: the driver requires both the shaper
and the 3D LUT. If userspace sets the shaper to pass-through, you say
the 3D LUT gets ignored. That's a bug: the if the driver cannot make 3D
LUT work without a shaper, then it must invent an identity shaper when
userspace sets it to pass-through. Or fail the atomic commit.

Userspace could as well set the 3D LUT to pass-through and the shaper
to something else, and expect it to work. If the atomic test commit
does not fail, it must do what userspace prescribes.

You just cannot expose arbitrary hardware limitations to userspace like
that and silently malfunction if they are violated. At the very least
fail the atomic commit if you don't want to make it work in the driver.

Thanks,
pq

> 
> It's a really simple test case, but I don't see how this approach
> invalidates the test results. To generalize the test for other devices,
> I can restrict shaper LUT usage for AMD.
> 
> Melissa
> 
> > 
> >   
> > > > 
> > > > +	/**
> > > > +	 * @lut3d:
> > > > +	 *
> > > > +	 * 3D Lookup table for converting pixel data. Position where it takes
> > > > +	 * place depends on hw design, after @ctm or @gamma_lut. See
> > > > +	 * drm_crtc_enable_color_mgmt(). The blob (if not NULL) is an array of
> > > > +	 * &struct drm_color_lut.
> > > > +	 */
> > > > +	struct drm_property_blob *lut3d;
> > > > 
> > > > I do not like the wording of "depends on hw design", and it is used in
> > > > very many places here. The KMS UAPI semantics cannot vary based on
> > > > hardware. Your cover letter defines the order in the color pipeline, so
> > > > I don't understand how this here can depend on hw.
> > > > 
> > > > What can depend on hardware is which KMS UAPI properties are exposed,
> > > > and how you map a property to a hardware unit (which can even change
> > > > based on the exact pipeline configuration as long as the results are as
> > > > the UAPI doc defines). But this comment here is talking about the UAPI
> > > > properties, not hw elements.
> > > >     
> > > 
> > > You are right! My initial idea was to explain that it's possible for
> > > other vendors color pipeline to fit this pipeline internally, if they
> > > need a 1D LUT before the 3D LUT, but not the 1D LUT in the end.
> > >   
> > > > 
> > > > I'm happy that the 3D LUT interface is being developed, but as you can
> > > > see from my questions, the UAPI documentation is practically missing. I
> > > > would have no idea how to use this as is.    
> > > 
> > > Thank you again for your valuable comments. I'll address your comments
> > > in a next version by better explaining all these points.  
> > 
> > Thank you for working on this!
> > pq  
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20230214/12b556bc/attachment-0001.sig>