[Intel-gfx] [PATCH 0/6] Pipe level color management

Fri Jan 22 09:07:53 PST 2016

Hi,

On 22 January 2016 at 16:21, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Fri, Jan 22, 2016 at 04:06:15PM +0000, Lionel Landwerlin wrote:
>> On 22/01/16 15:04, Daniel Stone wrote:
>> >Now with everything just using split-gamma mode, I'm much happier with
>> >how this is looking. I took a look at some other architectures to see
>> >how this would fit, and also had a chat with Richard Hughes to clear
>> >some things up. AMD seems to support every possible mode under the
>> >sun, so should support any API we came up with. Most other
>> >architectures only implemented a single gamma table (equivalent to
>> >legacy gamma ramp), but there was one I have fairly detailed
>> >documentation for and also supports everything.
>>
>> There might be some interest from others to have a single 12bits post csc
>> gamma LUT.
>> So I was going to submit another serie to enable this as a special case just
>> for some generations of the Intel hardware once this work lands.
>>
>> Obviously in order to program the hardware in that mode you would need to a
>> userspace specially tuned for the particular platform on which it's running.
>> This mode wouldn't be exposed through the current set of properties.
>
> Yeah, the idea there is that we don't tell this userspace explicitly, but
> it can be used, i.e.
> a) in the config properties we advertise the sizes of the split gamma
> table
> b) but when userspace supplies an atomic request that matches the larger
> 12bit gamma, and no pre-ctm gamma, then we'll use that.

Right, exactly. My concern was mainly that it seemed incredibly easy
to program broken state (12-bit LUT seemingly active both pre- and
post-CTM). I honestly wasn't sure whether the code or my expectations
were wrong though. But adding it back properly is just fine.

>> >Intel supports two quite fun properties of matrix output. Firstly, the
>> >range is (-3.0..3.0) rather than the (0.0..1.0) you might expect.
>> >Negative values are axis-mirrored, i.e. lut2_index =
>> >fabs(matrix_output), thus giving us a LUT range of (0.0..3.0).
>> >Secondly, whilst (0.0..1.0) is represented by linear LUT entries, the
>> >LUT values for (1.0..3.0) are calculated by a linear interpolation
>> >between LUT entry #512 (i.e. that for 1.0) and a bonus entry #513
>> >(value for 3.0). I haven't seen this supported anywhere else, so would
>> >tend towards mirroring the last value into the extra supernumerary
>> >entry, i.e. emulate saturation for matrix output values to 1.0.
>>
>> It's good you mention this, because I wrote a test assuming negative values
>> would be clamped to (0.0..1.0) and the test passes :/
>> So maybe there is something fishy here...

Hmm, the doc explicitly mentions the axis-mirroring/sign-stripping,
but is either so weird or so incorrect that I think you'd probably
only be able to tell empirically. ;)

>> >I don't really know what to do about negative values as CTM output,
>> >since the doc I have here is silent on whether negative values are
>> >similarly axis-mirrored/sign-stripped, or whether they are instead
>> >clamped to 0.0. Either way, I'm not really sure it's behaviour we can
>> >rely upon to be portable.
>>
>> Maybe we should compute both and verify at least one works?

Well, I'd have to write the entire CM support for that platform first,
which might be a little while ... ;)

>> We also discussed briefly the multiplication results. My tests show the
>> Intel hardware seems to clamp the results.
>> Let's say you have a 255 pixel going through a 0.5 coefficient, you should
>> get 127.5 which would be rounded to 128.
>> Intel hardware gives us 127.
>> Same for a pixel at 255 and a coefficient of 0.25. You would get 63.75 so 64
>> rounded and my results show 63.
>>
>> Again, computing both hypothesis might be a solution.

The docs I have here are silent on what happens. Not much we can do
about it really, so I think we can just leave it as
platform-dependent.

>> >As a detail, the architecture I'm looking at has mixed granularity for
>> >the second (post-CTM) LUT: lower RGB-value entries have higher
>> >granularity (precision in LUT indexing), with lower granularity for
>> >higher entries. I don't think this is a problem though, since we can
>> >just decimate in the kernel (i.e. ignore every n'th LUT entry, to
>> >write a smaller LUT to hardware than we received to userspace).
>
> This was the original plan (iirc one of the byt luts is like this too).

Cool; it's the only thing I could think of which made any sense.

>> >Anyway, beyond that, it seems there are a few things we agree on:
>> >   - optional pre-matrix ('degamma') per-channel LUT of variable
>> >length, but (from a userspace point of view) fixed precision, input &
>> >output ranges 0.0..1.0
>> >   - optional 3x3 matrix with input range [0.0,1.0], with output values
>> >saturating to 1.0, and negative values producing undefined behaviour
>> >   - optional post-matrix ('gamma') per-channel LUT of variable length,
>> >but (from a userspace point of view) fixed precision, input & output
>> >ranges 0.0..1.0
>
> Yeah I thought same here, we'll support 0.0..1.0 and anything that
> overshoots is platform-defined. I have no idea what to do with the
> overshoot lut table entries, so I guess we'll just leave those until
> someone screams badly ;-)

Program the overshoot register (LUT entry #513) to the same as the
final LUT entry (#512), and overshoot becomes saturation, i.e. desired
semantics. Not really sure what to do with undershoot (clamp to 0 or
mirror axes), but given that's completely hardware dependent and there
doesn't seem much we can do about it, I'd leave it at documenting
'don't do that'.

Cheers,
Daniel