[Intel-gfx] [PATCH v2 00/10] Color Manager Implementation

Tue Jul 14 02:11:17 PDT 2015

On Tue, Jul 14, 2015 at 10:17:09AM +0200, Hans Verkuil wrote:
> On 07/13/15 16:07, Daniel Vetter wrote:
> > On Mon, Jul 13, 2015 at 12:11:08PM +0200, Hans Verkuil wrote:
> >> On 07/13/2015 11:54 AM, Daniel Vetter wrote:
> >>> On Mon, Jul 13, 2015 at 11:43:31AM +0200, Hans Verkuil wrote:
> >>>> On 07/13/2015 11:18 AM, Daniel Vetter wrote:
> >>>>> On Mon, Jul 13, 2015 at 10:29:32AM +0200, Hans Verkuil wrote:
> >>>>>> On 06/15/2015 08:53 AM, Daniel Vetter wrote:
> >>>>>>> On Tue, Jun 09, 2015 at 01:50:48PM +0100, Damien Lespiau wrote:
> >>>>>>>> On Thu, Jun 04, 2015 at 07:12:31PM +0530, Kausal Malladi wrote:
> >>>>>>>>> From: Kausal Malladi <Kausal.Malladi at intel.com>
> >>>>>>>>>
> >>>>>>>>> This patch set adds color manager implementation in drm/i915 layer.
> >>>>>>>>> Color Manager is an extension in i915 driver to support color 
> >>>>>>>>> correction/enhancement. Various Intel platforms support several
> >>>>>>>>> color correction capabilities. Color Manager provides abstraction
> >>>>>>>>> of these properties and allows a user space UI agent to 
> >>>>>>>>> correct/enhance the display.
> >>>>>>>>
> >>>>>>>> So I did a first rough pass on the API itself. The big question that
> >>>>>>>> isn't solved at the moment is: do we want to try to do generic KMS
> >>>>>>>> properties for pre-LUT + matrix + post-LUT or not. "Generic" has 3 levels:
> >>>>>>>>
> >>>>>>>>   1/ Generic for all KMS drivers
> >>>>>>>>   2/ Generic for i915 supported platfoms
> >>>>>>>>   3/ Specific to each platform
> >>>>>>>>
> >>>>>>>> At this point, I'm quite tempted to say we should give 1/ a shot. We
> >>>>>>>> should be able to have pre-LUT + matrix + post-LUT on CRTC objects and
> >>>>>>>> guarantee that, when the drivers expose such properties, user space can
> >>>>>>>> at least give 8 bits LUT + 3x3 matrix + 8 bits LUT.
> >>>>>>>>
> >>>>>>>> It may be possible to use the "try" version of the atomic ioctl to
> >>>>>>>> explore the space of possibilities from a generic user space to use
> >>>>>>>> bigger LUTs as well. A HAL layer (which is already there in some but not
> >>>>>>>> all OSes) would still be able to use those generic properties to load
> >>>>>>>> "precision optimized" LUTs with some knowledge of the hardware.
> >>>>>>>
> >>>>>>> Yeah, imo 1/ should be doable. For the matrix we should be able to be
> >>>>>>> fully generic with a 16.16 format. For gamma one option would be to have
> >>>>>>
> >>>>>> I know I am late replying, apologies for that.
> >>>>>>
> >>>>>> I've been working on CSC support for V4L2 as well (still work in progress)
> >>>>>> and I would like to at least end up with the same low-level fixed point
> >>>>>> format as DRM so we can share matrix/vector calculations.
> >>>>>>
> >>>>>> Based on my experiences I have concerns about the 16.16 format: the precision
> >>>>>> is quite low which can be a problem when such values are used in matrix
> >>>>>> multiplications.
> >>>>>>
> >>>>>> In addition, while the precision may be sufficient for 8 bit color component
> >>>>>> values, I'm pretty sure it will be insufficient when dealing with 12 or 16 bit
> >>>>>> color components.
> >>>>>>
> >>>>>> In earlier versions of my CSC code I used a 12.20 format, but in the latest I
> >>>>>> switched to 32.32. This fits nicely in a u64 and it's easy to extract the
> >>>>>> integer and fractional parts.
> >>>>>>
> >>>>>> If this is going to be a generic and future proof API, then my suggestion
> >>>>>> would be to increase the precision of the underlying data type.
> >>>>>
> >>>>> We discussed this a bit more internally and figured it would be nice to have the same
> >>>>> fixed point for both CSC matrix and LUT/gamma tables. Current consensus
> >>>>> seems to be to go with 8.24 for both. Since LUTs are fairly big I think it
> >>>>> makes sense if we try to be not too wasteful (while still future-proof
> >>>>> ofc).
> >>>>
> >>>> The .24 should have enough precision, but I am worried about the 8: while
> >>>> this works for 8 bit components, you can't use it to represent values
> >>>>> 255, which might be needed (now or in the future) for 10, 12 or 16 bit
> >>>> color components.
> >>>>
> >>>> It's why I ended up with 32.32: it's very generic so usable for other
> >>>> things besides CSC.
> >>>>
> >>>> Note that 8.24 is really 7.24 + one sign bit. So 255 can't be represented
> >>>> in this format.
> >>>>
> >>>> That said, all values I'm working with in my current code are small integers
> >>>> (say between -4 and 4 worst case), so 8.24 would work. But I am not at all
> >>>> confident that this is future proof. My gut feeling is that you need to be
> >>>> able to represent at least the max component value + a sign bit + 7 decimals
> >>>> precision. Which makes 17.24.
> >>>
> >>> The idea is to steal from GL and always normalize everything to [0.0,
> >>> 1.0], irrespective of the source color format. We need that in drm since
> >>> if you blend together planes with different formats it's completely
> >>> undefined which one you should pick. 8 bits of precision for values out of
> >>> range should be enough ;-)
> >>
> >> That doesn't really help much, using a [0-1] range just means that you need
> >> more precision for the fraction since the integer precision is now added to
> >> the fractional precision.
> >>
> >> So for 16-bit color components the 8.24 format will leave you with only 8 bits
> >> precision if you scale each component to the [0-1] range. That's slightly more
> >> than 2 decimals. I don't believe that is enough. If you do a gamma table lookup
> >> and then feed the result to a CSC matrix you need more precision if you want
> >> to get accurate results.
> > 
> > Hm, why do we need 8 bits more precision than source data? At least in the
> > intel hw I've seen the most bits we can stuff into the hw is 0.12 (again
> > for rescaled range to 0.0-1.0). 24 bits means as-is we'll throw 12 bits
> > away. What would you want to use these bits for?
> 
> The intel hardware uses 12 bits today, but what about the next-gen? If you are
> defining an API and data type just for the hardware the kernel supports today,
> then 12 bits might be enough precision. If you want to be future proof then you
> need to be prepared for more capable future hardware.
> 
> So 0.12 will obviously not be enough if you want to support 16 bit color components
> in the future.
> 
> In addition, to fully support HW colorspace conversion (e.g. sRGB to Rec.709) where
> lookup tables are used for implementing the transfer functions (normal and inverse),
> then you need more precision then just the number of bits per component or you will
> get quite large errors in the calculation.
> 
> It all depends how a LUT is used: if the value from the LUT is the 'final' value,
> then you don't need more precision than the number of bits of a color component. But
> if it is used in other calculations (3x3 matrices, full/limited range scaling, etc),
> then the LUT should provide more bits precision.
> 
> Which seems to be the case with Intel hardware: 12 bits is 4 bits more than the 8 bits
> per component it probably uses.

Intel hw supports a 12bpp pixel pipeline. They didnt add _any_ additional
precision at all afaik. Which is why I wonder why we need it. I'm also not
aware of any plans for pushing past 12bpp of data sent to the sink, but I
honestly don't have much clue really.

I guess input is a different story, todays cmos already easily to 14bit
with more to come I guess with all the noise about HDR. We probably need
more headroom on v4l input side than we ever need on drm display side.
Still 24bits is an awful lot of headroom, at least for the next few years.
Or do you expect to hit that already soonish on v4l side?

> I would guess that a LUT supporting 16 bit color components would need a precision
> of 0.20 or so (assuming the resulting values are used in further calculations).
> 
> High dynamic range video will be an important driving force towards higher bit depths
> and accurate color handling, so you can expect to see this become much more important
> in the coming years.
> 
> And as I mentioned another consideration is that this fixed point data type might
> be useful elsewhere in the kernel where you need to do some precision arithmetic.
> So using a standard type that anyone can use with functions in lib/ to do basic
> operations can be very useful indeed beyond just DRM and V4L2.

0.20 would still comfortably fit into 8.24. And yeah worst-case (in 10
years or so) we need to add a high-bpp variant if it really comes to it.
But for the next few years (and that's the pace at which we completely
rewrite gfx drivers anyway) I don't see anything going past .24. .16 was
indeed a bit too little I think, which is why we decided to move the fixed
point a bit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch