[RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

Tue Nov 7 16:52:16 UTC 2023

On 2023-10-26 15:25, Alex Goins wrote:
> On Thu, 26 Oct 2023, Sebastian Wick wrote:
> 
>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>>> Alex Goins <agoins at nvidia.com> wrote:
>>>
>>>> Thank you Harry and all other contributors for your work on this. Responses
>>>> inline -
>>>>
>>>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>>>
>>>>> On Fri, 20 Oct 2023 11:23:28 -0400
>>>>> Harry Wentland <harry.wentland at amd.com> wrote:
>>>>>
>>>>>> On 2023-10-20 10:57, Pekka Paalanen wrote:
>>>>>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>>>>>> Sebastian Wick <sebastian.wick at redhat.com> wrote:
>>>>>>>
>>>>>>>> Thanks for continuing to work on this!
>>>>>>>>
>>>>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:

snip

>>>
>>> If we look at BT.2100, there is no such encoding even mentioned where
>>> 125.0 would correspond to 10k cd/m². That 125.0 convention already has
>>> a built-in assumption what the color spaces are and what the conversion
>>> is aiming to do. IOW, I would say that choice is opinionated from the
>>> start. The multiplier in BT.2100 is always 10000.
> 
> Be that as it may, the convention of FP16 125.0 corresponding to 10k nits is
> baked in our hardware, so it's unavoidable at least for NVIDIA pipelines.
> 

Yeah, that's not just NVidia, it's basically the same for AMD. Though I
think we can work without that assumption, but the PQ TF you get from AMD
will map to [0.0, 125.0].

snip

>>
>> We could simply fail commits when the pipeline and pixel format don't
>> work together. We'll probably need some kind of ingress no-op node
>> anyway and maybe could list pixel formats there if required to make it
>> easier to find a working configuration.
> 
> Yeah, we could, but having to figure that out through trial and error would be
> unfortunate. Per above, it might be easiest to just tag pipelines with a pixel
> format instead of trying to include the pixel format conversion as a color op.
> 

Agreed, We've been looking at libliftoff a bit but one of the problem is
that it does a lot of atomic checks to figure out an optimal HW plane
configuration and we run out of time budget before we're able to check
all options.

Atomic check failure is really not well suited for this stuff.

>>> "Without the need to define a new type" is something I think we need to
>>> consider case by case. I have a hard time giving a general opinion.
>>>
>>>>>>>
>>>>>>> Counter-example 2: image size scaling colorop. It might not be
>>>>>>> configurable, it is controlled by the plane CRTC_* and SRC_*
>>>>>>> properties. You still need to understand what it does, so you can
>>>>>>> arrange the scaling to work correctly. (Do not want to scale an image
>>>>>>> with PQ-encoded values as Josh demonstrated in XDC.)
>>>>>>>
>>>>>>
>>>>>> IMO the position of the scaling operation is the thing that's important
>>>>>> here as the color pipeline won't define scaling properties.
>>>>
>>>> I agree that blending should ideally be done in linear space, and I remember
>>>> that from Josh's presentation at XDC, but I don't recall the same being said for
>>>> scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
>>>> pipeline that is meant to be in PQ space (more on this below), and that was
>>>> found to achieve better results at HDR/SDR boundaries. Of course, this only
>>>> bolsters the argument that it would be helpful to have an informational "scaler"
>>>> element to understand at which stage scaling takes place.
>>>
>>> Both blending and scaling are fundamentally the same operation: you
>>> have two or more source colors (pixels), and you want to compute a
>>> weighted average of them following what happens in nature, that is,
>>> physics, as that is what humans are used to.
>>>
>>> Both blending and scaling will suffer from the same problems if the
>>> operation is performed on not light-linear values. The result of the
>>> weighted average does not correspond to physics.
>>>
>>> The problem may be hard to observe with natural imagery, but Josh's
>>> example shows it very clearly. Maybe that effect is sometimes useful
>>> for some imagery in some use cases, but it is still an accidental
>>> side-effect. You might get even better results if you don't rely on
>>> accidental side-effects but design a separate operation for the exact
>>> goal you have.
>>>
>>> Mind, by scaling we mean changing image size. Not scaling color values.
>>>
> 
> Fair enough, but it might not always be a choice given the hardware.
> 

I'm thinking of this as an information element, not a programmable.
Some HW could define this as programmable, but I probably wouldn't
on AMD HW.

snip

>>>
>>> What I was left puzzled about after the XDC workshop is that is it
>>> possible to pre-load configurations in the background (slow), and then
>>> quickly switch between them? Hardware-wise I mean.
> 
> This works fine for our "fast" LUTs, you just point them to a surface in video
> memory and they flip to it. You could keep multiple surfaces around and flip
> between them without having to reprogram them in software. We can easily do that
> with enumerated curves, populating them when the driver initializes instead of
> waiting for the client to request them. You can even point multiple hardware
> LUTs to the same video memory surface, if they need the same curve.
> 

Ultimately I think that's the best way to solve this problem, but it needs
HW that can do this.

snip

>>
>> The prepare-commit idea for blob properties would help to make the
>> pipelines usable again, but until then it's probably a good idea to just
>> not expose those pipelines.
> 
> The prepare-commit idea actually wouldn't work for these LUTs, because they are
> programmed using methods instead of pointing them to a surface. I'm actually not
> sure how slow it actually is, would need to benchmark it. I think not exposing
> them at all would be overkill, since it would mean you can't use the preblending
> scaler or tonemapper, and animation isn't necessary for that.
> 

I tend to agree. Maybe a "Heavy Operation" flag that tells userspace they can
use it but it might come at a significant cost.

Harry

> The AMD 3DLUT is another example of a LUT that is slow to update, and it would
> obviously be a major loss if that wasn't exposed. There just needs to be some
> way for clients to know if they are going to kill performance by trying to
> change it every frame.
> 
> Thanks,
> Alex
> 
>>
>>>
>>>
>>> Thanks,
>>> pq
>>
>>