[Nouveau] nouveau 30bpp / deep color status

Sun Feb 4 23:50:45 UTC 2018

In case anyone's curious about 30bpp framebuffer support, here's the
current status:

Kernel:

Ben and I have switched the code to using a 256-based LUT for Kepler+,
and I've also written a patch to cause the addfb ioctl to use the
proper format. You can pick this up at:

https://github.com/skeggsb/linux/commits/linux-4.16 (note the branch!)
https://patchwork.freedesktop.org/patch/202322/

With these two, you should be able to use "X -depth 30" again on any
G80+ GPU to bring up a screen (as you could in kernel 4.9 and
earlier). However this still has some deficiencies, some of which I've
addressed:

xf86-video-nouveau:

DRI3 was broken, and Xv was broken. Patches available at:

https://github.com/imirkin/xf86-video-nouveau/commits/master

mesa:

The NVIDIA hardware (pre-Kepler) can only do XBGR scanout. Further the
nouveau KMS doesn't add XRGB scanout for Kepler+ (although it could).
Mesa was only enabled for XRGB, so I've piped XBGR through all the
same places:

https://github.com/imirkin/mesa/commits/30bpp

libdrm:

For testing, I added a modetest gradient pattern split horizontally.
Top half is 10bpc, bottom half is 8bpc. This is useful for seeing
whether you're really getting 10bpc, or if things are getting
truncated along the way. Definitely hacky, but ... wasn't intending on
upstreaming it anyways:

https://github.com/imirkin/drm/commit/9b8776f58448b5745675c3a7f5eb2735e3989441

-------------------------------------

Results with the patches (tested on a GK208B and a "deep color" TV over HDMI):
 - modetest with a 10bpc gradient shows up smoother than an 8bpc
gradient. However it's still dithered to 8bpc, not "real" 10bpc.
 - things generally work in X -- dri2 and dri3, xv, and obviously
regular X rendering / acceleration
 - lots of X software can't handle 30bpp modes (mplayer hates it for
xv and x11 rendering, aterm bails on shading the root pixmap, probably
others)

I'm also told that with DP, it should actually send the higher-bpc
data over the wire. With HDMI, we're still stuck at 24bpp for now
(although the hardware can do 36bpp as well). This is why my gradient
result above was still dithered.

Things to do - mostly nouveau specific, but probably some general
infra needed too:
 - Figure out how to properly expose the 1024-sized LUT
 - Add fp16 scanout
 - Stop relying on the max bpc of the monitor/connector and make
decisions based on the "effective" bpc (e.g. based on the
currently-set fb format, take hdmi/dp into account, etc). This will
also affect the max clock somehow. Perhaps there should be a way to
force a connector to a certain bpc.
 - Add higher-bpc HDMI support
 - Add 10bpc dithering (only makes sense if >= 10bpc output is
*actually* enabled first)
 - Investigate YUV HDMI modes (esp since they can enable 4K at 60 on HDMI
1.4 hardware)
 - Test out Wayland compositors
 - Teach xf86-video-modesetting about addfb2 or that nouveau's
ordering is different.

I don't necessarily plan on working further on this, so if there are
interested parties, they should definitely try to pick it up. I'll try
to upstream all my changes though.

Cheers,

  -ilia