[Intel-gfx] [PATCH] sna/uxa: Fix colormap handling at screen depth 30.

Thu Mar 15 23:19:22 UTC 2018

Oops, didn't reply yet, sorry!

On Thu, Mar 15, 2018 at 5:14 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> Quoting Ville Syrjälä (2018-03-15 16:02:42)
>> On Thu, Mar 15, 2018 at 03:28:18PM +0000, Chris Wilson wrote:
>> > Quoting Ville Syrjälä (2018-03-01 11:12:53)
>> > > On Thu, Mar 01, 2018 at 02:20:48AM +0100, Mario Kleiner wrote:
>> > > > The various clut handling functions like a setup
>> > > > consistent with the x-screen color depth. Otherwise
>> > > > we observe improper sampling in the gamma tables
>> > > > at depth 30.
>> > > >
>> > > > Therefore replace hard-coded bitsPerRGB = 8 by actual
>> > > > bits per channel scrn->rgbBits. Also use this for call
>> > > > to xf86HandleColormaps().
>> > > >
>> > > > Tested for uxa and sna at depths 8, 16, 24 and 30 on
>> > > > IvyBridge, and tested at depth 24 and 30 that xgamma
>> > > > and gamma table animations work, and with measurement
>> > > > equipment to make sure identity gamma ramps actually
>> > > > are identity mappings at the output.
>> > >
>> > > You mean identity mapping at 8bpc? We don't support higher precision
>> > > gamma on pre-bdw atm, and the ddx doesn't use the higher precision
>> > > stuff even on bdw+. I'm working on fixing both, but it turned out to
>> > > be a bit more work than I anticipated so will take a while.
>> > >
>> > > >
>> > > > Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com>
>> > > > ---
>> > > >  src/sna/sna_driver.c   | 5 +++--
>> > > >  src/uxa/intel_driver.c | 3 ++-
>> > > >  2 files changed, 5 insertions(+), 3 deletions(-)
>> > > >
>> > > > diff --git a/src/sna/sna_driver.c b/src/sna/sna_driver.c
>> > > > index 2643e6c..9c4bcd4 100644
>> > > > --- a/src/sna/sna_driver.c
>> > > > +++ b/src/sna/sna_driver.c
>> > > > @@ -1145,7 +1145,7 @@ sna_screen_init(SCREEN_INIT_ARGS_DECL)
>> > > >       if (!miInitVisuals(&visuals, &depths, &nvisuals, &ndepths, &rootdepth,
>> > > >                          &defaultVisual,
>> > > >                          ((unsigned long)1 << (scrn->bitsPerPixel - 1)),
>> > > > -                        8, -1))
>> > > > +                        scrn->rgbBits, -1))
>> > > >               return FALSE;
>> > > >
>> > > >       if (!miScreenInit(screen, NULL,
>> > > > @@ -1217,7 +1217,8 @@ sna_screen_init(SCREEN_INIT_ARGS_DECL)
>> > > >               return FALSE;
>> > > >
>> > > >       if (sna->mode.num_real_crtc &&
>> > > > -         !xf86HandleColormaps(screen, 256, 8, sna_load_palette, NULL,
>> > > > +         !xf86HandleColormaps(screen, 1 << scrn->rgbBits, scrn->rgbBits,
>> > > > +                              sna_load_palette, NULL,
>> > > >                                CMAP_RELOAD_ON_MODE_SWITCH |
>> > > >                                CMAP_PALETTED_TRUECOLOR))
>> > >
>> > > I already forgot what this does prior to your randr fix. IIRC bumping
>> > > the 8 alone would cause the thing to segfault, but I guess bumping both
>> > > was fine?
>> > >

We always need this fix for X-Screen depth 30, even on older servers.
With the current maxColors=256, bitsPerPixel=8 setting and color depth
30, the server does some out of bounds reads in its gamma handling
code for maxColors=256, and we get crash at server startup. It's a bit
a matter of luck to reproduce. I had it running for months without
problems, then after some Ubuntu system upgrade i got crashes at
server startup under sna and at server shutdown under uxa.

Without raising the bitsPerPixel to 10, we get some bottleneck in the
way the server mushes together the old XF86VidMode gamma ramps
(per-x-screen) and the new RandR per-crtc gamma ramps, so there are
artifacts in the gamma table finally uploaded to the hw.

For DefaultDeph=24 this patch doesn't change anything.

On X-Server < 1.20 however, without my fix, at color depth 30 this
will get us stuck on a identity gamma ramp, as the update code in the
server effectively no-ops. Or so i think, because i tested so many
permutations of so many things on intel,amd,nouveau with different
mesa,server,ddx branches lately that i may misremember something. Ilia
reported some odd behavior on 1.19 with the corresponding fix to
nouveau-ddx... I'll give it another try in testing. I actually want to
get that fix cherry-picked into server 1.19.7, so depth 30 would be
usable on the 1.19 series.

>> > > Hmm. So the server always initializes crtc->gamma_size to 256
>> > > (which does match the normal hw LUT size), and so before your
>> > > fix we will always get gamma_slots==0 at >8bpc and so the hw LUT
>> > > is never actually updated?
>> >

Yes. The kernel kms reports true hw gamma size, but no ddx ever made
use of that info on any driver, so changing it to anything but 256 on
the kernel side would break all existing userspace drivers, at least
for the old gammaset ioctl.

>> > Was there any conclusion to this? Aiui, these bits should be set based
>> > on the underlying HW gamma table depth which is not the same as the
>> > screen colour depth. It stuck at 256 for ages as that is all anyone
>> > ever expected...
>>

As far as i understand, these bits in the end define what is used as
index into the hw tables. The server partially derives sizes of tables
from these numbers, partially from the red/green/blueSize of the
selected root visual. It allocates various tables for color palettes,
xf86vidmode per screen lut's, RandR per crtc lut's and uses values
from one table to index into the next one, to keep both RandR and
XF86Vidmode gamma ramps and the X color palettes all working, by
concatenating everything into a final 256 slot hw lut and passing it
to the kernel.

At the very end, the actual values of the RandR tables get written
into the slots of the hw tables and used with whatever precision those
tables have in the hw. The values in the color palettes and
xf86vidmode luts are used to influence which RandR slots get actually
copied into the hw lut.

>> IIRC there was some kind of agreement that Mario's approach here is
>> generally what we want to do. The server randr code will simply throw
>> away the LUT entries we can't use and driver will still get the
>> expected 256 entry LUT via the randr hooks. Obviously that means the
>> output may not be exactly what the user wanted, but for normal
>> gamma curve type of stuff it's pretty close.
>>
>> I'm mostly concerned what happens when the server doesn't have
>> Mario's gamma fixes. IIRC something will either crash, or simply
>> not update the gamma LUT ever.
>
> I guess we just make it conditional on xorg_version >= 1.20 to be on the
> safe side.
>
>> Once I manage to pimp up the i915 gamma LUT stuff I plan to tackle
>> the ddx side to actually make use of the higher precision gamma modes
>> in the hardware. Actually I already started on the ddx side a bit just
>> didn't get the kernel quite ready yet. Once it's all done we should
>> have 10bit gamma to match the screen depth on all platforms. On gen4
>> it will be an interpolated curve though, so if the user programs an
>> arbitrary LUT we'll not be able to replicate it correctly. But again,
>> for normal gamma curve type of stuff it'll work just fine. For ilk+
>> we will get a full 1024 entry LUT.

That would be great. My aim with all the 10 bits stuff was to allow
regular X11/GLX apps to get at least 10 bits precision rendering and
display in OpenGL, possibly even more at the output side, with support
for at least Ironlake and later. To be useful for scientific/medical
research applications, the high precision needs to be accessible under
X11 with existing RandR per-crtc gamma ramp api, as Wayland isn't yet
an option for most of the needs of such apps.

I made a proof of concept patchset against i915-kms to use the 1024
slot, 10 bit wide hw lut's starting with Ironlake, and to upsample
from the legacy 256 slots to those 1024 slot luts. Also some hacks to
enable dithering from 10 bit -> 8 bit panels via a module parameter,
because i currently don't have a true 10 bit panel to test with.
That's what i used for testing with the Mesa patches, but the code
wasn't quite ready for upstreaming, and i spent all the time since
then with the userspace bits.

Have you thought about also exposing the 512 slot, 12 bit (or was it
14 bits?) wide hw luts, with degamma and CTM disabled? It wouldn't
provide a slot for each of the 1024 framebuffer output values, but
would give some extra precision for gamma correction on the output
side for HDMI/DisplayPort deep color with 12 bit or 16 bit panels, or
even on 10 bit panels + dithering?

-mario

>
> Sounds good.
> -Chris