[Intel-gfx] [PATCH v2 00/10] Color Manager Implementation

Mon Jul 13 02:43:31 PDT 2015

On 07/13/2015 11:18 AM, Daniel Vetter wrote:
> On Mon, Jul 13, 2015 at 10:29:32AM +0200, Hans Verkuil wrote:
>> On 06/15/2015 08:53 AM, Daniel Vetter wrote:
>>> On Tue, Jun 09, 2015 at 01:50:48PM +0100, Damien Lespiau wrote:
>>>> On Thu, Jun 04, 2015 at 07:12:31PM +0530, Kausal Malladi wrote:
>>>>> From: Kausal Malladi <Kausal.Malladi at intel.com>
>>>>>
>>>>> This patch set adds color manager implementation in drm/i915 layer.
>>>>> Color Manager is an extension in i915 driver to support color 
>>>>> correction/enhancement. Various Intel platforms support several
>>>>> color correction capabilities. Color Manager provides abstraction
>>>>> of these properties and allows a user space UI agent to 
>>>>> correct/enhance the display.
>>>>
>>>> So I did a first rough pass on the API itself. The big question that
>>>> isn't solved at the moment is: do we want to try to do generic KMS
>>>> properties for pre-LUT + matrix + post-LUT or not. "Generic" has 3 levels:
>>>>
>>>>   1/ Generic for all KMS drivers
>>>>   2/ Generic for i915 supported platfoms
>>>>   3/ Specific to each platform
>>>>
>>>> At this point, I'm quite tempted to say we should give 1/ a shot. We
>>>> should be able to have pre-LUT + matrix + post-LUT on CRTC objects and
>>>> guarantee that, when the drivers expose such properties, user space can
>>>> at least give 8 bits LUT + 3x3 matrix + 8 bits LUT.
>>>>
>>>> It may be possible to use the "try" version of the atomic ioctl to
>>>> explore the space of possibilities from a generic user space to use
>>>> bigger LUTs as well. A HAL layer (which is already there in some but not
>>>> all OSes) would still be able to use those generic properties to load
>>>> "precision optimized" LUTs with some knowledge of the hardware.
>>>
>>> Yeah, imo 1/ should be doable. For the matrix we should be able to be
>>> fully generic with a 16.16 format. For gamma one option would be to have
>>
>> I know I am late replying, apologies for that.
>>
>> I've been working on CSC support for V4L2 as well (still work in progress)
>> and I would like to at least end up with the same low-level fixed point
>> format as DRM so we can share matrix/vector calculations.
>>
>> Based on my experiences I have concerns about the 16.16 format: the precision
>> is quite low which can be a problem when such values are used in matrix
>> multiplications.
>>
>> In addition, while the precision may be sufficient for 8 bit color component
>> values, I'm pretty sure it will be insufficient when dealing with 12 or 16 bit
>> color components.
>>
>> In earlier versions of my CSC code I used a 12.20 format, but in the latest I
>> switched to 32.32. This fits nicely in a u64 and it's easy to extract the
>> integer and fractional parts.
>>
>> If this is going to be a generic and future proof API, then my suggestion
>> would be to increase the precision of the underlying data type.
> 
> We discussed this a bit more internally and figured it would be nice to have the same
> fixed point for both CSC matrix and LUT/gamma tables. Current consensus
> seems to be to go with 8.24 for both. Since LUTs are fairly big I think it
> makes sense if we try to be not too wasteful (while still future-proof
> ofc).

The .24 should have enough precision, but I am worried about the 8: while
this works for 8 bit components, you can't use it to represent values
>255, which might be needed (now or in the future) for 10, 12 or 16 bit
color components.

It's why I ended up with 32.32: it's very generic so usable for other
things besides CSC.

Note that 8.24 is really 7.24 + one sign bit. So 255 can't be represented
in this format.

That said, all values I'm working with in my current code are small integers
(say between -4 and 4 worst case), so 8.24 would work. But I am not at all
confident that this is future proof. My gut feeling is that you need to be
able to represent at least the max component value + a sign bit + 7 decimals
precision. Which makes 17.24.

Regards,

	Hans

> 
> But yeah agreeing on the underlying layout would be good so that we could
> share in-kernel code. We're aiming to not have any LUT interpolation in
> the kernel (just dropping samples at most if e.g. the hw table doesn't
> have linear sample positions). But with the LUT we might need to mutliply
> it with an in-kernel one (we need the CSC unit on some platforms to
> compress the color output range for hdmi). And maybe compress the LUTs
> too.
> -Daniel
>