[Intel-gfx] [PATCH v2 00/10] Color Manager Implementation

Tue Jul 14 03:16:00 PDT 2015

On Tue, Jul 14, 2015 at 11:35:30AM +0200, Hans Verkuil wrote:
> On 07/14/15 11:11, Daniel Vetter wrote:
> > On Tue, Jul 14, 2015 at 10:17:09AM +0200, Hans Verkuil wrote:
> >> On 07/13/15 16:07, Daniel Vetter wrote:
> >>> On Mon, Jul 13, 2015 at 12:11:08PM +0200, Hans Verkuil wrote:
> >>>> On 07/13/2015 11:54 AM, Daniel Vetter wrote:
> >>>>> On Mon, Jul 13, 2015 at 11:43:31AM +0200, Hans Verkuil wrote:
> >>>>>> On 07/13/2015 11:18 AM, Daniel Vetter wrote:
> >>>>>>> On Mon, Jul 13, 2015 at 10:29:32AM +0200, Hans Verkuil wrote:
> >>>>>>>> On 06/15/2015 08:53 AM, Daniel Vetter wrote:
> >>>>>>>>> On Tue, Jun 09, 2015 at 01:50:48PM +0100, Damien Lespiau wrote:
> >>>>>>>>>> On Thu, Jun 04, 2015 at 07:12:31PM +0530, Kausal Malladi wrote:
> >>>>>>>>>>> From: Kausal Malladi <Kausal.Malladi at intel.com>
> >>>>>>>>>>>
> >>>>>>>>>>> This patch set adds color manager implementation in drm/i915 layer.
> >>>>>>>>>>> Color Manager is an extension in i915 driver to support color 
> >>>>>>>>>>> correction/enhancement. Various Intel platforms support several
> >>>>>>>>>>> color correction capabilities. Color Manager provides abstraction
> >>>>>>>>>>> of these properties and allows a user space UI agent to 
> >>>>>>>>>>> correct/enhance the display.
> >>>>>>>>>>
> >>>>>>>>>> So I did a first rough pass on the API itself. The big question that
> >>>>>>>>>> isn't solved at the moment is: do we want to try to do generic KMS
> >>>>>>>>>> properties for pre-LUT + matrix + post-LUT or not. "Generic" has 3 levels:
> >>>>>>>>>>
> >>>>>>>>>>   1/ Generic for all KMS drivers
> >>>>>>>>>>   2/ Generic for i915 supported platfoms
> >>>>>>>>>>   3/ Specific to each platform
> >>>>>>>>>>
> >>>>>>>>>> At this point, I'm quite tempted to say we should give 1/ a shot. We
> >>>>>>>>>> should be able to have pre-LUT + matrix + post-LUT on CRTC objects and
> >>>>>>>>>> guarantee that, when the drivers expose such properties, user space can
> >>>>>>>>>> at least give 8 bits LUT + 3x3 matrix + 8 bits LUT.
> >>>>>>>>>>
> >>>>>>>>>> It may be possible to use the "try" version of the atomic ioctl to
> >>>>>>>>>> explore the space of possibilities from a generic user space to use
> >>>>>>>>>> bigger LUTs as well. A HAL layer (which is already there in some but not
> >>>>>>>>>> all OSes) would still be able to use those generic properties to load
> >>>>>>>>>> "precision optimized" LUTs with some knowledge of the hardware.
> >>>>>>>>>
> >>>>>>>>> Yeah, imo 1/ should be doable. For the matrix we should be able to be
> >>>>>>>>> fully generic with a 16.16 format. For gamma one option would be to have
> >>>>>>>>
> >>>>>>>> I know I am late replying, apologies for that.
> >>>>>>>>
> >>>>>>>> I've been working on CSC support for V4L2 as well (still work in progress)
> >>>>>>>> and I would like to at least end up with the same low-level fixed point
> >>>>>>>> format as DRM so we can share matrix/vector calculations.
> >>>>>>>>
> >>>>>>>> Based on my experiences I have concerns about the 16.16 format: the precision
> >>>>>>>> is quite low which can be a problem when such values are used in matrix
> >>>>>>>> multiplications.
> >>>>>>>>
> >>>>>>>> In addition, while the precision may be sufficient for 8 bit color component
> >>>>>>>> values, I'm pretty sure it will be insufficient when dealing with 12 or 16 bit
> >>>>>>>> color components.
> >>>>>>>>
> >>>>>>>> In earlier versions of my CSC code I used a 12.20 format, but in the latest I
> >>>>>>>> switched to 32.32. This fits nicely in a u64 and it's easy to extract the
> >>>>>>>> integer and fractional parts.
> >>>>>>>>
> >>>>>>>> If this is going to be a generic and future proof API, then my suggestion
> >>>>>>>> would be to increase the precision of the underlying data type.
> >>>>>>>
> >>>>>>> We discussed this a bit more internally and figured it would be nice to have the same
> >>>>>>> fixed point for both CSC matrix and LUT/gamma tables. Current consensus
> >>>>>>> seems to be to go with 8.24 for both. Since LUTs are fairly big I think it
> >>>>>>> makes sense if we try to be not too wasteful (while still future-proof
> >>>>>>> ofc).
> >>>>>>
> >>>>>> The .24 should have enough precision, but I am worried about the 8: while
> >>>>>> this works for 8 bit components, you can't use it to represent values
> >>>>>>> 255, which might be needed (now or in the future) for 10, 12 or 16 bit
> >>>>>> color components.
> >>>>>>
> >>>>>> It's why I ended up with 32.32: it's very generic so usable for other
> >>>>>> things besides CSC.
> >>>>>>
> >>>>>> Note that 8.24 is really 7.24 + one sign bit. So 255 can't be represented
> >>>>>> in this format.
> >>>>>>
> >>>>>> That said, all values I'm working with in my current code are small integers
> >>>>>> (say between -4 and 4 worst case), so 8.24 would work. But I am not at all
> >>>>>> confident that this is future proof. My gut feeling is that you need to be
> >>>>>> able to represent at least the max component value + a sign bit + 7 decimals
> >>>>>> precision. Which makes 17.24.
> >>>>>
> >>>>> The idea is to steal from GL and always normalize everything to [0.0,
> >>>>> 1.0], irrespective of the source color format. We need that in drm since
> >>>>> if you blend together planes with different formats it's completely
> >>>>> undefined which one you should pick. 8 bits of precision for values out of
> >>>>> range should be enough ;-)
> >>>>
> >>>> That doesn't really help much, using a [0-1] range just means that you need
> >>>> more precision for the fraction since the integer precision is now added to
> >>>> the fractional precision.
> >>>>
> >>>> So for 16-bit color components the 8.24 format will leave you with only 8 bits
> >>>> precision if you scale each component to the [0-1] range. That's slightly more
> >>>> than 2 decimals. I don't believe that is enough. If you do a gamma table lookup
> >>>> and then feed the result to a CSC matrix you need more precision if you want
> >>>> to get accurate results.
> >>>
> >>> Hm, why do we need 8 bits more precision than source data? At least in the
> >>> intel hw I've seen the most bits we can stuff into the hw is 0.12 (again
> >>> for rescaled range to 0.0-1.0). 24 bits means as-is we'll throw 12 bits
> >>> away. What would you want to use these bits for?
> >>
> >> The intel hardware uses 12 bits today, but what about the next-gen? If you are
> >> defining an API and data type just for the hardware the kernel supports today,
> >> then 12 bits might be enough precision. If you want to be future proof then you
> >> need to be prepared for more capable future hardware.
> >>
> >> So 0.12 will obviously not be enough if you want to support 16 bit color components
> >> in the future.
> >>
> >> In addition, to fully support HW colorspace conversion (e.g. sRGB to Rec.709) where
> >> lookup tables are used for implementing the transfer functions (normal and inverse),
> >> then you need more precision then just the number of bits per component or you will
> >> get quite large errors in the calculation.
> >>
> >> It all depends how a LUT is used: if the value from the LUT is the 'final' value,
> >> then you don't need more precision than the number of bits of a color component. But
> >> if it is used in other calculations (3x3 matrices, full/limited range scaling, etc),
> >> then the LUT should provide more bits precision.
> >>
> >> Which seems to be the case with Intel hardware: 12 bits is 4 bits more than the 8 bits
> >> per component it probably uses.
> > 
> > Intel hw supports a 12bpp pixel pipeline. They didnt add _any_ additional
> > precision at all afaik. Which is why I wonder why we need it. I'm also not
> > aware of any plans for pushing past 12bpp of data sent to the sink, but I
> > honestly don't have much clue really.
> > 
> > I guess input is a different story, todays cmos already easily to 14bit
> > with more to come I guess with all the noise about HDR. We probably need
> > more headroom on v4l input side than we ever need on drm display side.
> > Still 24bits is an awful lot of headroom, at least for the next few years.
> > Or do you expect to hit that already soonish on v4l side?
> 
> I think 24 bits precision is enough, but that assumes that the integer part
> will be between -128 and 127. And I am not so sure that that is a valid assumption.

The idea is always that you'd normalize to 0.0-1.0 of the range going over
the wire to the sink. The 7 bits of headroom is just for smoother clamping
when your colorspaces don't match up. The most I've seen in intel hw is 3
additional bits used there.
> 
> It's true today, but what if you have a HW LUT that maps integer values and expects
> 16.0 or perhaps 12.4?
> 
> BTW, I am assuming that the proposed 8.24 format is a signed format: the CSC
> 3x3 matrices contain negative values, so any fixed point data type has to be signed.

Yeah, it's s8.24 really. We definitely need a signed integer part, agreed
on that.

> I'm just wondering: is it really such a big deal to use a 32.32 format? Yes, the
> amount of data doubles, but it's quite rare that you need to configure a LUT, right?
> 
> For a 12 bit LUT it's 16 kB vs 32 kB. Yes, it's more data, but the advantage is that
> the data type is future proof (well, probably :-) ) and much more likely to be usable
> in other subsystems.

We need to bash this stuff into hw under spin_lock_irqsave in i915. Yeah
props to our hw engineers for screwing things up, but I'd like to not be
too wasteful. But otoh the mmios will totally swamp any kind of memory
loads we're doing.

The other bit is that our android folks are a bit over the top with
reducing overhead sometimes, e.g. they don't like keeping around
metadata-only drm_framebuffer objects for xrgb vs. argb because it takes
away a few bytes ;-)

> >> I would guess that a LUT supporting 16 bit color components would need a precision
> >> of 0.20 or so (assuming the resulting values are used in further calculations).
> >>
> >> High dynamic range video will be an important driving force towards higher bit depths
> >> and accurate color handling, so you can expect to see this become much more important
> >> in the coming years.
> >>
> >> And as I mentioned another consideration is that this fixed point data type might
> >> be useful elsewhere in the kernel where you need to do some precision arithmetic.
> >> So using a standard type that anyone can use with functions in lib/ to do basic
> >> operations can be very useful indeed beyond just DRM and V4L2.
> > 
> > 0.20 would still comfortably fit into 8.24. And yeah worst-case (in 10
> > years or so) we need to add a high-bpp variant if it really comes to it.
> 
> I think this is much closer than you think. I agree that you are not likely to see
> this soon for consumer graphics cards, but for professional equipment and high-end
> consumer electronics this is another story.
> 
> And if it is being done for input, then output will need it as well: after all,
> what's the point of 16-bit color components if you can't display it? Whether Intel
> will support it is another matter, but there are other vendors, you know... :-)

Input is different because of post-processing - you need that much depth
to be able to get useful data out of the dark areas, without the risk for
the highlights to clip. While processing you need that depth to avoid
banding (because integer math sucks). But tbh I haven't seen anything but
12bpc (and those usually use dithered 10bpc panels internally) anywhere
and the common screens top out at 10bpc.

So from my pov of drm s8.24 will be enough for a long time, but if you're
convinced that the input side needs this soon I guess it makes sense to go
with the bit more overhead and 32.32. Otoh we'll never need 32 of integer
part if we normalize to 0.0-1.0, and that normalization is really
something I think we want.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch