[Intel-gfx] [PATCH v2 00/10] Color Manager Implementation

Tue Jul 14 01:17:09 PDT 2015

On 07/13/15 16:07, Daniel Vetter wrote:
> On Mon, Jul 13, 2015 at 12:11:08PM +0200, Hans Verkuil wrote:
>> On 07/13/2015 11:54 AM, Daniel Vetter wrote:
>>> On Mon, Jul 13, 2015 at 11:43:31AM +0200, Hans Verkuil wrote:
>>>> On 07/13/2015 11:18 AM, Daniel Vetter wrote:
>>>>> On Mon, Jul 13, 2015 at 10:29:32AM +0200, Hans Verkuil wrote:
>>>>>> On 06/15/2015 08:53 AM, Daniel Vetter wrote:
>>>>>>> On Tue, Jun 09, 2015 at 01:50:48PM +0100, Damien Lespiau wrote:
>>>>>>>> On Thu, Jun 04, 2015 at 07:12:31PM +0530, Kausal Malladi wrote:
>>>>>>>>> From: Kausal Malladi <Kausal.Malladi at intel.com>
>>>>>>>>>
>>>>>>>>> This patch set adds color manager implementation in drm/i915 layer.
>>>>>>>>> Color Manager is an extension in i915 driver to support color 
>>>>>>>>> correction/enhancement. Various Intel platforms support several
>>>>>>>>> color correction capabilities. Color Manager provides abstraction
>>>>>>>>> of these properties and allows a user space UI agent to 
>>>>>>>>> correct/enhance the display.
>>>>>>>>
>>>>>>>> So I did a first rough pass on the API itself. The big question that
>>>>>>>> isn't solved at the moment is: do we want to try to do generic KMS
>>>>>>>> properties for pre-LUT + matrix + post-LUT or not. "Generic" has 3 levels:
>>>>>>>>
>>>>>>>>   1/ Generic for all KMS drivers
>>>>>>>>   2/ Generic for i915 supported platfoms
>>>>>>>>   3/ Specific to each platform
>>>>>>>>
>>>>>>>> At this point, I'm quite tempted to say we should give 1/ a shot. We
>>>>>>>> should be able to have pre-LUT + matrix + post-LUT on CRTC objects and
>>>>>>>> guarantee that, when the drivers expose such properties, user space can
>>>>>>>> at least give 8 bits LUT + 3x3 matrix + 8 bits LUT.
>>>>>>>>
>>>>>>>> It may be possible to use the "try" version of the atomic ioctl to
>>>>>>>> explore the space of possibilities from a generic user space to use
>>>>>>>> bigger LUTs as well. A HAL layer (which is already there in some but not
>>>>>>>> all OSes) would still be able to use those generic properties to load
>>>>>>>> "precision optimized" LUTs with some knowledge of the hardware.
>>>>>>>
>>>>>>> Yeah, imo 1/ should be doable. For the matrix we should be able to be
>>>>>>> fully generic with a 16.16 format. For gamma one option would be to have
>>>>>>
>>>>>> I know I am late replying, apologies for that.
>>>>>>
>>>>>> I've been working on CSC support for V4L2 as well (still work in progress)
>>>>>> and I would like to at least end up with the same low-level fixed point
>>>>>> format as DRM so we can share matrix/vector calculations.
>>>>>>
>>>>>> Based on my experiences I have concerns about the 16.16 format: the precision
>>>>>> is quite low which can be a problem when such values are used in matrix
>>>>>> multiplications.
>>>>>>
>>>>>> In addition, while the precision may be sufficient for 8 bit color component
>>>>>> values, I'm pretty sure it will be insufficient when dealing with 12 or 16 bit
>>>>>> color components.
>>>>>>
>>>>>> In earlier versions of my CSC code I used a 12.20 format, but in the latest I
>>>>>> switched to 32.32. This fits nicely in a u64 and it's easy to extract the
>>>>>> integer and fractional parts.
>>>>>>
>>>>>> If this is going to be a generic and future proof API, then my suggestion
>>>>>> would be to increase the precision of the underlying data type.
>>>>>
>>>>> We discussed this a bit more internally and figured it would be nice to have the same
>>>>> fixed point for both CSC matrix and LUT/gamma tables. Current consensus
>>>>> seems to be to go with 8.24 for both. Since LUTs are fairly big I think it
>>>>> makes sense if we try to be not too wasteful (while still future-proof
>>>>> ofc).
>>>>
>>>> The .24 should have enough precision, but I am worried about the 8: while
>>>> this works for 8 bit components, you can't use it to represent values
>>>>> 255, which might be needed (now or in the future) for 10, 12 or 16 bit
>>>> color components.
>>>>
>>>> It's why I ended up with 32.32: it's very generic so usable for other
>>>> things besides CSC.
>>>>
>>>> Note that 8.24 is really 7.24 + one sign bit. So 255 can't be represented
>>>> in this format.
>>>>
>>>> That said, all values I'm working with in my current code are small integers
>>>> (say between -4 and 4 worst case), so 8.24 would work. But I am not at all
>>>> confident that this is future proof. My gut feeling is that you need to be
>>>> able to represent at least the max component value + a sign bit + 7 decimals
>>>> precision. Which makes 17.24.
>>>
>>> The idea is to steal from GL and always normalize everything to [0.0,
>>> 1.0], irrespective of the source color format. We need that in drm since
>>> if you blend together planes with different formats it's completely
>>> undefined which one you should pick. 8 bits of precision for values out of
>>> range should be enough ;-)
>>
>> That doesn't really help much, using a [0-1] range just means that you need
>> more precision for the fraction since the integer precision is now added to
>> the fractional precision.
>>
>> So for 16-bit color components the 8.24 format will leave you with only 8 bits
>> precision if you scale each component to the [0-1] range. That's slightly more
>> than 2 decimals. I don't believe that is enough. If you do a gamma table lookup
>> and then feed the result to a CSC matrix you need more precision if you want
>> to get accurate results.
> 
> Hm, why do we need 8 bits more precision than source data? At least in the
> intel hw I've seen the most bits we can stuff into the hw is 0.12 (again
> for rescaled range to 0.0-1.0). 24 bits means as-is we'll throw 12 bits
> away. What would you want to use these bits for?

The intel hardware uses 12 bits today, but what about the next-gen? If you are
defining an API and data type just for the hardware the kernel supports today,
then 12 bits might be enough precision. If you want to be future proof then you
need to be prepared for more capable future hardware.

So 0.12 will obviously not be enough if you want to support 16 bit color components
in the future.

In addition, to fully support HW colorspace conversion (e.g. sRGB to Rec.709) where
lookup tables are used for implementing the transfer functions (normal and inverse),
then you need more precision then just the number of bits per component or you will
get quite large errors in the calculation.

It all depends how a LUT is used: if the value from the LUT is the 'final' value,
then you don't need more precision than the number of bits of a color component. But
if it is used in other calculations (3x3 matrices, full/limited range scaling, etc),
then the LUT should provide more bits precision.

Which seems to be the case with Intel hardware: 12 bits is 4 bits more than the 8 bits
per component it probably uses.

I would guess that a LUT supporting 16 bit color components would need a precision
of 0.20 or so (assuming the resulting values are used in further calculations).

High dynamic range video will be an important driving force towards higher bit depths
and accurate color handling, so you can expect to see this become much more important
in the coming years.

And as I mentioned another consideration is that this fixed point data type might
be useful elsewhere in the kernel where you need to do some precision arithmetic.
So using a standard type that anyone can use with functions in lib/ to do basic
operations can be very useful indeed beyond just DRM and V4L2.

> 
>>> Oh and we might need those since for CSC and at least some LUTs you can do
>>> this.
>>
>> Sorry, I don't understand this sentence. What does 'those' and 'this' refer to?
> 
> I meant that the higher bits before the decimal are needed in some cases
> by the hw since it allows values > logical 1.0. At least some hw supports
> 16bit (half) floats as scanout sources too.

Ah, OK. Thanks for the clarification.

Regards,

	Hans