Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
My main motivation for this is squeezing every bit of precision out of the hardware for scientific and medical research applications, where fp16 in the unorm range is limited to ~11 bpc effective linear precision in the upper half [0.5;1.0] of the unorm range, although the hardware could do at least 12 bpc.
It has been successfully tested on AMD RavenRidge (DCN-1), and with Polaris11 (DCE-11.2). Up to two displays were active on RavenRidge (DP 2560x1440@144Hz + HDMI 2560x1440@120Hz), the maximum supported on my hw, both running at 10 bpc DP output depth.
Up to three displays were active on the Polaris (DP 2560x1440@144Hz + 2560x1440@100Hz USB-C DP-altMode-to-HDMI converter + eDP 2880x1800@60Hz Apple Retina panel), all running at 10 bpc output depth.
No malfunctions, visual artifacts or other oddities were observed (apart from an adventureous mess of cables and adapters on my desk), suggesting it works.
I used my automatic photometer measurement procedure to verify the effective output precision of 10 bpc DP native signal + spatial dithering in the gpu as enabled by the amdgpu driver. Results show the expected 12 bpc precision i hoped for -- the current upper limit for AMD display hw afaik.
So it seems to work in the way i hoped :).
Some open questions wrt. AMD DC, to be addressed in this patch series, or follow up patches if neccessary:
- For the atomic check for plane scaling, the current patch will apply the same hw limits as for other rgb fixed point fb's, e.g., for 8 bpc rgb8. Is this correct? Or would we need to use the fp16 limits, because this is also a 64 bpp format? Or something new entirely?
- I haven't added the new fourcc to the DCC tables yet. Should i?
- I had to change an assert for DCE to allow 36bpp linebuffers (patch 4/5). It looks to me as if that assert was inconsistent with other places in the driver where COLOR_DEPTH121212 is supported, and looking at the code, the change seems harmless. At least on DCE-11.2 the change didn't cause any noticeable (by myself) or measurable (by my equipment) problems on any of the 3 connected displays.
- Related to that change, while i needed to increase lb pixelsize to 36bpp to get > 10 bpc effective precision on DCN, i didn't need to do that on DCE. Also no change of lb pixelsize was needed on either DCN or DCe to get > 10 bpc precision for fp16 framebuffers, so something seems to behave differently for floating point 16 vs. fixed point 16. This all seems to suggest one could leave lb pixelsize at the old 30 bpp value on at least DCE-11.2 and still get the > 10 bpc precision if one wanted to avoid the changes of patch 4/5.
Thanks, -mario
These are 16 bits per color channel unsigned normalized formats. They are supported by at least AMD display hw, and suitable for direct scanout of Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/drm_fourcc.c | 4 ++++ include/uapi/drm/drm_fourcc.h | 7 +++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c index 03262472059c..ce13d2be5d7b 100644 --- a/drivers/gpu/drm/drm_fourcc.c +++ b/drivers/gpu/drm/drm_fourcc.c @@ -203,6 +203,10 @@ const struct drm_format_info *__drm_format_info(u32 format) { .format = DRM_FORMAT_ARGB16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_ABGR16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_AXBXGXRX106106106106, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, + { .format = DRM_FORMAT_XRGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 }, + { .format = DRM_FORMAT_XBGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 }, + { .format = DRM_FORMAT_ARGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, + { .format = DRM_FORMAT_ABGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_RGB888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_BGR888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_XRGB8888_A8, .depth = 32, .num_planes = 2, .cpp = { 4, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index f76de49c768f..f7156322aba5 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -168,6 +168,13 @@ extern "C" { #define DRM_FORMAT_RGBA1010102 fourcc_code('R', 'A', '3', '0') /* [31:0] R:G:B:A 10:10:10:2 little endian */ #define DRM_FORMAT_BGRA1010102 fourcc_code('B', 'A', '3', '0') /* [31:0] B:G:R:A 10:10:10:2 little endian */
+/* 64 bpp RGB */ +#define DRM_FORMAT_XRGB16161616 fourcc_code('X', 'R', '4', '8') /* [63:0] x:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_XBGR16161616 fourcc_code('X', 'B', '4', '8') /* [63:0] x:B:G:R 16:16:16:16 little endian */ + +#define DRM_FORMAT_ARGB16161616 fourcc_code('A', 'R', '4', '8') /* [63:0] A:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_ABGR16161616 fourcc_code('A', 'B', '4', '8') /* [63:0] A:B:G:R 16:16:16:16 little endian */ + /* * Floating point 64bpp RGB * IEEE 754-2008 binary16 half-precision float
On Fri, Mar 19, 2021 at 10:03:13PM +0100, Mario Kleiner wrote:
These are 16 bits per color channel unsigned normalized formats. They are supported by at least AMD display hw, and suitable for direct scanout of Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_fourcc.c | 4 ++++ include/uapi/drm/drm_fourcc.h | 7 +++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c index 03262472059c..ce13d2be5d7b 100644 --- a/drivers/gpu/drm/drm_fourcc.c +++ b/drivers/gpu/drm/drm_fourcc.c @@ -203,6 +203,10 @@ const struct drm_format_info *__drm_format_info(u32 format) { .format = DRM_FORMAT_ARGB16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_ABGR16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_AXBXGXRX106106106106, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_XRGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_XBGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_ARGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_RGB888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_BGR888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_XRGB8888_A8, .depth = 32, .num_planes = 2, .cpp = { 4, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },{ .format = DRM_FORMAT_ABGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index f76de49c768f..f7156322aba5 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -168,6 +168,13 @@ extern "C" { #define DRM_FORMAT_RGBA1010102 fourcc_code('R', 'A', '3', '0') /* [31:0] R:G:B:A 10:10:10:2 little endian */ #define DRM_FORMAT_BGRA1010102 fourcc_code('B', 'A', '3', '0') /* [31:0] B:G:R:A 10:10:10:2 little endian */
+/* 64 bpp RGB */ +#define DRM_FORMAT_XRGB16161616 fourcc_code('X', 'R', '4', '8') /* [63:0] x:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_XBGR16161616 fourcc_code('X', 'B', '4', '8') /* [63:0] x:B:G:R 16:16:16:16 little endian */
+#define DRM_FORMAT_ARGB16161616 fourcc_code('A', 'R', '4', '8') /* [63:0] A:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_ABGR16161616 fourcc_code('A', 'B', '4', '8') /* [63:0] A:B:G:R 16:16:16:16 little endian */
These look reasonable enough to me. IIRC we should be able to expose them on some recent Intel hw as well.
Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com
On Fri, Mar 19, 2021 at 10:16 PM Ville Syrjälä ville.syrjala@linux.intel.com wrote:
On Fri, Mar 19, 2021 at 10:03:13PM +0100, Mario Kleiner wrote:
These are 16 bits per color channel unsigned normalized formats. They are supported by at least AMD display hw, and suitable for direct scanout of Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_fourcc.c | 4 ++++ include/uapi/drm/drm_fourcc.h | 7 +++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c index 03262472059c..ce13d2be5d7b 100644 --- a/drivers/gpu/drm/drm_fourcc.c +++ b/drivers/gpu/drm/drm_fourcc.c @@ -203,6 +203,10 @@ const struct drm_format_info *__drm_format_info(u32 format) { .format = DRM_FORMAT_ARGB16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_ABGR16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_AXBXGXRX106106106106, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_XRGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_XBGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_ARGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_ABGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_RGB888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_BGR888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_XRGB8888_A8, .depth = 32, .num_planes = 2, .cpp = { 4, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index f76de49c768f..f7156322aba5 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -168,6 +168,13 @@ extern "C" { #define DRM_FORMAT_RGBA1010102 fourcc_code('R', 'A', '3', '0') /* [31:0] R:G:B:A 10:10:10:2 little endian */ #define DRM_FORMAT_BGRA1010102 fourcc_code('B', 'A', '3', '0') /* [31:0] B:G:R:A 10:10:10:2 little endian */
+/* 64 bpp RGB */ +#define DRM_FORMAT_XRGB16161616 fourcc_code('X', 'R', '4', '8') /* [63:0] x:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_XBGR16161616 fourcc_code('X', 'B', '4', '8') /* [63:0] x:B:G:R 16:16:16:16 little endian */
+#define DRM_FORMAT_ARGB16161616 fourcc_code('A', 'R', '4', '8') /* [63:0] A:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_ABGR16161616 fourcc_code('A', 'B', '4', '8') /* [63:0] A:B:G:R 16:16:16:16 little endian */
These look reasonable enough to me. IIRC we should be able to expose them on some recent Intel hw as well.
Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com
Thanks Ville!
Indeed i looked over the Intel PRM's, and while fp16 support seems to be rather recent (Gen8? Gen9? Gen10? Can't remember atm.), iirc, I found references to rgb16 fixed point back to gen5 / Ironlake. That would be pretty cool! The precision limit for the encoders on Intel is also 12 bpc atm., right?
-mario
-- Ville Syrjälä Intel
On Fri, Mar 19, 2021 at 10:45:10PM +0100, Mario Kleiner wrote:
On Fri, Mar 19, 2021 at 10:16 PM Ville Syrjälä ville.syrjala@linux.intel.com wrote:
On Fri, Mar 19, 2021 at 10:03:13PM +0100, Mario Kleiner wrote:
These are 16 bits per color channel unsigned normalized formats. They are supported by at least AMD display hw, and suitable for direct scanout of Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_fourcc.c | 4 ++++ include/uapi/drm/drm_fourcc.h | 7 +++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c index 03262472059c..ce13d2be5d7b 100644 --- a/drivers/gpu/drm/drm_fourcc.c +++ b/drivers/gpu/drm/drm_fourcc.c @@ -203,6 +203,10 @@ const struct drm_format_info *__drm_format_info(u32 format) { .format = DRM_FORMAT_ARGB16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_ABGR16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_AXBXGXRX106106106106, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_XRGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_XBGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_ARGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_ABGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_RGB888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_BGR888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_XRGB8888_A8, .depth = 32, .num_planes = 2, .cpp = { 4, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index f76de49c768f..f7156322aba5 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -168,6 +168,13 @@ extern "C" { #define DRM_FORMAT_RGBA1010102 fourcc_code('R', 'A', '3', '0') /* [31:0] R:G:B:A 10:10:10:2 little endian */ #define DRM_FORMAT_BGRA1010102 fourcc_code('B', 'A', '3', '0') /* [31:0] B:G:R:A 10:10:10:2 little endian */
+/* 64 bpp RGB */ +#define DRM_FORMAT_XRGB16161616 fourcc_code('X', 'R', '4', '8') /* [63:0] x:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_XBGR16161616 fourcc_code('X', 'B', '4', '8') /* [63:0] x:B:G:R 16:16:16:16 little endian */
+#define DRM_FORMAT_ARGB16161616 fourcc_code('A', 'R', '4', '8') /* [63:0] A:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_ABGR16161616 fourcc_code('A', 'B', '4', '8') /* [63:0] A:B:G:R 16:16:16:16 little endian */
These look reasonable enough to me. IIRC we should be able to expose them on some recent Intel hw as well.
Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com
Thanks Ville!
Indeed i looked over the Intel PRM's, and while fp16 support seems to be rather recent (Gen8? Gen9? Gen10? Can't remember atm.), iirc, I found references to rgb16 fixed point back to gen5 / Ironlake.
fp16 has been around since forever (gen4+) uint16 is much more recent, IIRC is something ~glk+
That would be pretty cool! The precision limit for the encoders on Intel is also 12 bpc atm., right?
Yes.
On Sat, Mar 20, 2021 at 04:09:47AM +0200, Ville Syrjälä wrote:
On Fri, Mar 19, 2021 at 10:45:10PM +0100, Mario Kleiner wrote:
On Fri, Mar 19, 2021 at 10:16 PM Ville Syrjälä ville.syrjala@linux.intel.com wrote:
On Fri, Mar 19, 2021 at 10:03:13PM +0100, Mario Kleiner wrote:
These are 16 bits per color channel unsigned normalized formats. They are supported by at least AMD display hw, and suitable for direct scanout of Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_fourcc.c | 4 ++++ include/uapi/drm/drm_fourcc.h | 7 +++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c index 03262472059c..ce13d2be5d7b 100644 --- a/drivers/gpu/drm/drm_fourcc.c +++ b/drivers/gpu/drm/drm_fourcc.c @@ -203,6 +203,10 @@ const struct drm_format_info *__drm_format_info(u32 format) { .format = DRM_FORMAT_ARGB16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_ABGR16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_AXBXGXRX106106106106, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_XRGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_XBGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_ARGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_ABGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_RGB888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_BGR888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_XRGB8888_A8, .depth = 32, .num_planes = 2, .cpp = { 4, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index f76de49c768f..f7156322aba5 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -168,6 +168,13 @@ extern "C" { #define DRM_FORMAT_RGBA1010102 fourcc_code('R', 'A', '3', '0') /* [31:0] R:G:B:A 10:10:10:2 little endian */ #define DRM_FORMAT_BGRA1010102 fourcc_code('B', 'A', '3', '0') /* [31:0] B:G:R:A 10:10:10:2 little endian */
+/* 64 bpp RGB */ +#define DRM_FORMAT_XRGB16161616 fourcc_code('X', 'R', '4', '8') /* [63:0] x:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_XBGR16161616 fourcc_code('X', 'B', '4', '8') /* [63:0] x:B:G:R 16:16:16:16 little endian */
+#define DRM_FORMAT_ARGB16161616 fourcc_code('A', 'R', '4', '8') /* [63:0] A:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_ABGR16161616 fourcc_code('A', 'B', '4', '8') /* [63:0] A:B:G:R 16:16:16:16 little endian */
These look reasonable enough to me. IIRC we should be able to expose them on some recent Intel hw as well.
Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com
Thanks Ville!
Indeed i looked over the Intel PRM's, and while fp16 support seems to be rather recent (Gen8? Gen9? Gen10? Can't remember atm.), iirc, I found references to rgb16 fixed point back to gen5 / Ironlake.
fp16 has been around since forever (gen4+) uint16 is much more recent, IIRC is something ~glk+
FYI I just hacked something together for i915: git://github.com/vsyrjala/linux.git uint16
Tests seem to pass on a glk here at least.
On Thu, May 6, 2021 at 8:37 AM Ville Syrjälä ville.syrjala@linux.intel.com wrote:
On Sat, Mar 20, 2021 at 04:09:47AM +0200, Ville Syrjälä wrote:
On Fri, Mar 19, 2021 at 10:45:10PM +0100, Mario Kleiner wrote:
On Fri, Mar 19, 2021 at 10:16 PM Ville Syrjälä ville.syrjala@linux.intel.com wrote:
On Fri, Mar 19, 2021 at 10:03:13PM +0100, Mario Kleiner wrote:
These are 16 bits per color channel unsigned normalized formats. They are supported by at least AMD display hw, and suitable for direct scanout of Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_fourcc.c | 4 ++++ include/uapi/drm/drm_fourcc.h | 7 +++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c index 03262472059c..ce13d2be5d7b 100644 --- a/drivers/gpu/drm/drm_fourcc.c +++ b/drivers/gpu/drm/drm_fourcc.c @@ -203,6 +203,10 @@ const struct drm_format_info *__drm_format_info(u32 format) { .format = DRM_FORMAT_ARGB16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_ABGR16161616F, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_AXBXGXRX106106106106, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_XRGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_XBGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
{ .format = DRM_FORMAT_ARGB16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
{ .format = DRM_FORMAT_ABGR16161616, .depth = 0, .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_RGB888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_BGR888_A8, .depth = 32, .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true }, { .format = DRM_FORMAT_XRGB8888_A8, .depth = 32, .num_planes = 2, .cpp = { 4, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true },
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index f76de49c768f..f7156322aba5 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -168,6 +168,13 @@ extern "C" { #define DRM_FORMAT_RGBA1010102 fourcc_code('R', 'A', '3', '0') /* [31:0] R:G:B:A 10:10:10:2 little endian */ #define DRM_FORMAT_BGRA1010102 fourcc_code('B', 'A', '3', '0') /* [31:0] B:G:R:A 10:10:10:2 little endian */
+/* 64 bpp RGB */ +#define DRM_FORMAT_XRGB16161616 fourcc_code('X', 'R', '4', '8') /* [63:0] x:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_XBGR16161616 fourcc_code('X', 'B', '4', '8') /* [63:0] x:B:G:R 16:16:16:16 little endian */
+#define DRM_FORMAT_ARGB16161616 fourcc_code('A', 'R', '4', '8') /* [63:0] A:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_ABGR16161616 fourcc_code('A', 'B', '4', '8') /* [63:0] A:B:G:R 16:16:16:16 little endian */
These look reasonable enough to me. IIRC we should be able to expose them on some recent Intel hw as well.
Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com
Thanks Ville!
Indeed i looked over the Intel PRM's, and while fp16 support seems to be rather recent (Gen8? Gen9? Gen10? Can't remember atm.), iirc, I found references to rgb16 fixed point back to gen5 / Ironlake.
fp16 has been around since forever (gen4+) uint16 is much more recent, IIRC is something ~glk+
FYI I just hacked something together for i915: git://github.com/vsyrjala/linux.git uint16
Tests seem to pass on a glk here at least.
Great! Thanks for doing this. I reviewed those 3 patches of yours, look good to me, also added R-b's to the individual patches on your git://github.com/vsyrjala/linux.git uint16:
Reviewed-by: Mario Kleiner mario.kleiner.de@gmail.com
Too bad uint16 isn't supported already on KBL hw, which is the most modern Intel hw i have atm, so i can't test them.
-mario
Add the necessary format definition, bandwidth and pixel size mappings, prescaler setup, and pixelformat selection, following the logic already present for SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616.
The new SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616 is implemented as the old SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616 format, but with swapped red <-> green color channel, by use of the hardware xbar.
Please note that on the DCN 1/2/3 display engines, the pixelformat in hubp and dpp setup for the old SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616 and the new SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616 was changed from format id 22 to id 26. See amd/include/navi10_enum.h for the meaning of the id's.
For format 22, the display engine read the framebuffer in 16 bpc format, but truncated to the 12 bpc actually supported by later pipeline stages. However, the engine took the 12 LSB of each color component for truncation, which is incompatible with rendering at least under Vulkan, where content is 16 bit wide, and a 12 MSB alignment would be appropriate, if any. Format 20 for ARGB16161616_12MSB does work, but even better, we can choose format 26 for ARGB16161616_UNORM, keeping all 16 bits around until later stages of the display pipeline.
This allows to directly consume what the rendering hw produces under Vulkan for swapchain format VK_FORMAT_R16G16B16A16_UNORM, as tested with a patched version of the current AMD open-source amdvlk driver which maps swapchain format VK_FORMAT_R16G16B16A16_UNORM onto DRM_FORMAT_XBGR16161616.
The old id 22 would cause colorful pixeltrash to be displayed instead.
Tested under DCN-1.0 and DCE-11.2.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c | 2 ++ drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c | 2 ++ drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 ++ drivers/gpu/drm/amd/display/dc/dc_hw_types.h | 2 ++ drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c | 2 ++ drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 1 + drivers/gpu/drm/amd/display/dc/dce110/dce110_mem_input_v.c | 1 + drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c | 6 ++++-- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c | 1 + drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c | 4 +++- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c | 3 ++- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubbub.c | 1 + drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c | 4 +++- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 1 + drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c | 3 ++- 15 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c index e633f8a51edb..4e3664db7456 100644 --- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c @@ -2827,6 +2827,7 @@ static void populate_initial_data( data->bytes_per_pixel[num_displays + 4] = 4; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: data->bytes_per_pixel[num_displays + 4] = 8; break; @@ -2930,6 +2931,7 @@ static void populate_initial_data( data->bytes_per_pixel[num_displays + 4] = 4; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: data->bytes_per_pixel[num_displays + 4] = 8; break; diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c index d4df4da5b81a..0e18df1283b6 100644 --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c @@ -236,6 +236,7 @@ static enum dcn_bw_defs tl_pixel_format_to_bw_defs(enum surface_pixel_format for case SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010_XR_BIAS: return dcn_bw_rgb_sub_32; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: return dcn_bw_rgb_sub_64; @@ -375,6 +376,7 @@ static void pipe_ctx_to_e2e_pipe_params ( input->src.viewport_height_c = input->src.viewport_height / 2; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: input->src.source_format = dm_444_64; diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 0c26c2ade782..f1aed40b3124 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -562,6 +562,7 @@ static enum pixel_format convert_pixel_format_to_dalsurface( dal_pixel_format = PIXEL_FORMAT_420BPP10; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: default: dal_pixel_format = PIXEL_FORMAT_UNKNOWN; break; @@ -2990,6 +2991,7 @@ unsigned int resource_pixel_format_to_bpp(enum surface_pixel_format format) #endif return 32; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: return 64; diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h index b41e6367b15e..87f8b1b486d3 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h +++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h @@ -182,6 +182,8 @@ enum surface_pixel_format { SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010_XR_BIAS, /*64 bpp */ SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616, + /*swapped*/ + SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616, /*float*/ SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F, /*swaped & float*/ diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c index 79a6f261a0da..4cdd4dacb761 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c @@ -566,6 +566,7 @@ static void program_grph_pixel_format( * should problem swap endian*/ format == SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010 || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010_XR_BIAS || + format == SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616 || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F) { /* ABGR formats */ red_xbar = 2; @@ -606,6 +607,7 @@ static void program_grph_pixel_format( fallthrough; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: /* shouldn't this get float too? */ case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: grph_depth = 3; grph_format = 0; break; diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index caee1c9f54bd..a4eec436ba2e 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -263,6 +263,7 @@ static void build_prescale_params(struct ipp_prescale_params *prescale_params, prescale_params->scale = 0x2008; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: prescale_params->scale = 0x2000; break; diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_mem_input_v.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_mem_input_v.c index 8bbb499067f7..db7557a1c613 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_mem_input_v.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_mem_input_v.c @@ -393,6 +393,7 @@ static void program_pixel_format( grph_format = 1; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: grph_depth = 3; diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c index 7f8456b9988b..a77e7bd3b8d5 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c @@ -257,7 +257,8 @@ static void dpp1_setup_format_flags(enum surface_pixel_format input_format,\ if (input_format == SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F || input_format == SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F) *fmt = PIXEL_FORMAT_FLOAT; - else if (input_format == SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616) + else if (input_format == SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616 || + input_format == SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616) *fmt = PIXEL_FORMAT_FIXED16; else *fmt = PIXEL_FORMAT_FIXED; @@ -368,7 +369,8 @@ void dpp1_cnv_setup ( select = INPUT_CSC_SELECT_ICSC; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: - pixel_format = 22; + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: + pixel_format = 26; /* ARGB16161616_UNORM */ break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: pixel_format = 24; diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c index 6f42d10dd772..f4f423d0b8c3 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c @@ -785,6 +785,7 @@ static bool hubbub1_dcc_support_pixel_format( *bytes_per_element = 4; return true; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: *bytes_per_element = 8; diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c index 9e796dfeac20..4e2ac6c5e35d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c @@ -245,6 +245,7 @@ void hubp1_program_pixel_format( if (format == SURFACE_PIXEL_FORMAT_GRPH_ABGR8888 || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010 || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010_XR_BIAS + || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616 || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F) { red_bar = 2; blue_bar = 3; @@ -277,8 +278,9 @@ void hubp1_program_pixel_format( SURFACE_PIXEL_FORMAT, 10); break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: /*we use crossbar already*/ REG_UPDATE(DCSURF_SURFACE_CONFIG, - SURFACE_PIXEL_FORMAT, 22); + SURFACE_PIXEL_FORMAT, 26); /* ARGB16161616_UNORM */ break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F:/*we use crossbar already*/ diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c index 4af96cc5d9d6..f2f44ddf522a 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c @@ -166,7 +166,8 @@ static void dpp2_cnv_setup ( select = DCN2_ICSC_SELECT_ICSC_A; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: - pixel_format = 22; + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: + pixel_format = 26; /* ARGB16161616_UNORM */ break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: pixel_format = 24; diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubbub.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubbub.c index 6d03d98fca22..91a9305d42e8 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubbub.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubbub.c @@ -158,6 +158,7 @@ bool hubbub2_dcc_support_pixel_format( *bytes_per_element = 4; return true; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: *bytes_per_element = 8; diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c index 0df0da2e6a4d..05c5494bf00f 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c @@ -428,6 +428,7 @@ void hubp2_program_pixel_format( if (format == SURFACE_PIXEL_FORMAT_GRPH_ABGR8888 || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010 || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010_XR_BIAS + || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616 || format == SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F) { red_bar = 2; blue_bar = 3; @@ -460,8 +461,9 @@ void hubp2_program_pixel_format( SURFACE_PIXEL_FORMAT, 10); break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: /*we use crossbar already*/ REG_UPDATE(DCSURF_SURFACE_CONFIG, - SURFACE_PIXEL_FORMAT, 22); + SURFACE_PIXEL_FORMAT, 26); /* ARGB16161616_UNORM */ break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F:/*we use crossbar already*/ diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index 2c2dbfcd8957..4083075c1ee6 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -2358,6 +2358,7 @@ int dcn20_populate_dml_pipes_from_context( pipes[pipe_cnt].pipe.src.source_format = dm_420_10; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: pipes[pipe_cnt].pipe.src.source_format = dm_444_64; diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c index 6e864b1a95c4..0bc5c5eba7af 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c @@ -245,7 +245,8 @@ static void dpp3_cnv_setup ( select = INPUT_CSC_SELECT_ICSC; break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: - pixel_format = 22; + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: + pixel_format = 26; /* ARGB16161616_UNORM */ break; case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: pixel_format = 24;
Testing with the photometer shows that at least Raven Ridge DCN-1.0 does not achieve more than 10 bpc effective output precision with a 16 bpc unorm surface of type SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616, unless linebuffer depth is increased from LB_PIXEL_DEPTH_30BPP to LB_PIXEL_DEPTH_36BPP. Otherwise precision gets truncated somewhere to 10 bpc effective depth.
Strangely this increase was not needed on Polaris11 DCE-11.2 during testing to get 12 bpc effective precision. It also is not needed for fp16 framebuffers.
Tested on DCN-1.0 and DCE-11.2.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 7 +++++-- drivers/gpu/drm/amd/display/dc/dce/dce_transform.c | 6 ++++-- drivers/gpu/drm/amd/display/dc/dce110/dce110_transform_v.c | 3 ++- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c | 3 ++- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c | 3 ++- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c | 3 ++- 8 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index f1aed40b3124..51e91b546d69 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1167,9 +1167,12 @@ bool resource_build_scaling_params(struct pipe_ctx *pipe_ctx)
/** * Setting line buffer pixel depth to 24bpp yields banding - * on certain displays, such as the Sharp 4k + * on certain displays, such as the Sharp 4k. 36bpp is needed + * to support SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616 and + * SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616 with actual > 10 bpc + * precision on at least DCN display engines. */ - pipe_ctx->plane_res.scl_data.lb_params.depth = LB_PIXEL_DEPTH_30BPP; + pipe_ctx->plane_res.scl_data.lb_params.depth = LB_PIXEL_DEPTH_36BPP; pipe_ctx->plane_res.scl_data.lb_params.alpha_en = plane_state->per_pixel_alpha;
pipe_ctx->plane_res.scl_data.recout.x += timing->h_border_left; diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c b/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c index 151dc7bf6d23..92b53a30d954 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c @@ -1647,7 +1647,8 @@ void dce_transform_construct( xfm_dce->lb_pixel_depth_supported = LB_PIXEL_DEPTH_18BPP | LB_PIXEL_DEPTH_24BPP | - LB_PIXEL_DEPTH_30BPP; + LB_PIXEL_DEPTH_30BPP | + LB_PIXEL_DEPTH_36BPP;
xfm_dce->lb_bits_per_entry = LB_BITS_PER_ENTRY; xfm_dce->lb_memory_size = LB_TOTAL_NUMBER_OF_ENTRIES; /*0x6B0*/ @@ -1675,7 +1676,8 @@ void dce60_transform_construct( xfm_dce->lb_pixel_depth_supported = LB_PIXEL_DEPTH_18BPP | LB_PIXEL_DEPTH_24BPP | - LB_PIXEL_DEPTH_30BPP; + LB_PIXEL_DEPTH_30BPP | + LB_PIXEL_DEPTH_36BPP;
xfm_dce->lb_bits_per_entry = LB_BITS_PER_ENTRY; xfm_dce->lb_memory_size = LB_TOTAL_NUMBER_OF_ENTRIES; /*0x6B0*/ diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_transform_v.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_transform_v.c index 29438c6050db..45bca0db5e5e 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_transform_v.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_transform_v.c @@ -708,7 +708,8 @@ bool dce110_transform_v_construct( xfm_dce->lb_pixel_depth_supported = LB_PIXEL_DEPTH_18BPP | LB_PIXEL_DEPTH_24BPP | - LB_PIXEL_DEPTH_30BPP; + LB_PIXEL_DEPTH_30BPP | + LB_PIXEL_DEPTH_36BPP;
xfm_dce->prescaler_on = true; xfm_dce->lb_bits_per_entry = LB_BITS_PER_ENTRY; diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c index a77e7bd3b8d5..91fdfcd8a14e 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c @@ -568,7 +568,8 @@ void dpp1_construct( dpp->lb_pixel_depth_supported = LB_PIXEL_DEPTH_18BPP | LB_PIXEL_DEPTH_24BPP | - LB_PIXEL_DEPTH_30BPP; + LB_PIXEL_DEPTH_30BPP | + LB_PIXEL_DEPTH_36BPP;
dpp->lb_bits_per_entry = LB_BITS_PER_ENTRY; dpp->lb_memory_size = LB_TOTAL_NUMBER_OF_ENTRIES; /*0x1404*/ diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index 89912bb5014f..25d198f60a1c 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -2470,7 +2470,7 @@ static void update_scaler(struct pipe_ctx *pipe_ctx) pipe_ctx->plane_state->per_pixel_alpha && pipe_ctx->bottom_pipe;
pipe_ctx->plane_res.scl_data.lb_params.alpha_en = per_pixel_alpha; - pipe_ctx->plane_res.scl_data.lb_params.depth = LB_PIXEL_DEPTH_30BPP; + pipe_ctx->plane_res.scl_data.lb_params.depth = LB_PIXEL_DEPTH_36BPP; /* scaler configuration */ pipe_ctx->plane_res.dpp->funcs->dpp_set_scaler( pipe_ctx->plane_res.dpp, &pipe_ctx->plane_res.scl_data); diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c index f2f44ddf522a..a9e420c7d75a 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c @@ -432,7 +432,8 @@ bool dpp2_construct( dpp->lb_pixel_depth_supported = LB_PIXEL_DEPTH_18BPP | LB_PIXEL_DEPTH_24BPP | - LB_PIXEL_DEPTH_30BPP; + LB_PIXEL_DEPTH_30BPP | + LB_PIXEL_DEPTH_36BPP;
dpp->lb_bits_per_entry = LB_BITS_PER_ENTRY; dpp->lb_memory_size = LB_TOTAL_NUMBER_OF_ENTRIES; /*0x1404*/ diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c index 0726fb435e2a..cd924f4688e1 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c @@ -1467,7 +1467,7 @@ static void dcn20_update_dchubp_dpp( plane_state->update_flags.bits.per_pixel_alpha_change || pipe_ctx->stream->update_flags.bits.scaling) { pipe_ctx->plane_res.scl_data.lb_params.alpha_en = pipe_ctx->plane_state->per_pixel_alpha; - ASSERT(pipe_ctx->plane_res.scl_data.lb_params.depth == LB_PIXEL_DEPTH_30BPP); + ASSERT(pipe_ctx->plane_res.scl_data.lb_params.depth == LB_PIXEL_DEPTH_36BPP); /* scaler configuration */ pipe_ctx->plane_res.dpp->funcs->dpp_set_scaler( pipe_ctx->plane_res.dpp, &pipe_ctx->plane_res.scl_data); diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c index 0bc5c5eba7af..9c8138e52ded 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c @@ -1443,7 +1443,8 @@ bool dpp3_construct( dpp->lb_pixel_depth_supported = LB_PIXEL_DEPTH_18BPP | LB_PIXEL_DEPTH_24BPP | - LB_PIXEL_DEPTH_30BPP; + LB_PIXEL_DEPTH_30BPP | + LB_PIXEL_DEPTH_36BPP;
dpp->lb_bits_per_entry = LB_BITS_PER_ENTRY; dpp->lb_memory_size = LB_TOTAL_NUMBER_OF_ENTRIES; /*0x1404*/
This is needed to avoid warnings with linebuffer depth 36 bpp. Testing on a Polaris11, DCE-11.2 on a 10 bit HDR-10 monitor showed no obvious problems, and this 12 bpc limit is consistent with what other function in the DCE bit depth reduction path use.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/amd/display/dc/dce/dce_transform.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c b/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c index 92b53a30d954..d9fd4ec60588 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c @@ -794,7 +794,7 @@ static void program_bit_depth_reduction( enum dcp_out_trunc_round_mode trunc_mode; bool spatial_dither_enable;
- ASSERT(depth < COLOR_DEPTH_121212); /* Invalid clamp bit depth */ + ASSERT(depth <= COLOR_DEPTH_121212); /* Invalid clamp bit depth */
spatial_dither_enable = bit_depth_params->flags.SPATIAL_DITHER_ENABLED; /* Default to 12 bit truncation without rounding */ @@ -854,7 +854,7 @@ static void dce60_program_bit_depth_reduction( enum dcp_out_trunc_round_mode trunc_mode; bool spatial_dither_enable;
- ASSERT(depth < COLOR_DEPTH_121212); /* Invalid clamp bit depth */ + ASSERT(depth <= COLOR_DEPTH_121212); /* Invalid clamp bit depth */
spatial_dither_enable = bit_depth_params->flags.SPATIAL_DITHER_ENABLED; /* Default to 12 bit truncation without rounding */
This is intended to enable direct high-precision scanout and pageflip of Vulkan swapchain images in format VK_FORMAT_R16G16B16A16_UNORM.
Expose DRM_FORMAT_XRGB16161616, DRM_FORMAT_ARGB16161616, DRM_FORMAT_XBGR16161616 and DRM_FORMAT_ABGR16161616 as 16 bpc unsigned normalized formats. These allow to take full advantage of the maximum precision of the display hardware, ie. currently up to 12 bpc.
Searching through old AMD M56, M76 and RV630 hw programming docs suggests that these 16 bpc formats are supported by all DCE and DCN display engines, so we can expose the formats unconditionally.
Successfully tested on AMD Polaris11 DCE-11.2 an RavenRidge DCN-1.0 with a HDR-10 monitor over 10 bpc DP output with spatial dithering enabled by the driver. Picture looks good, and my photometer measurement procedure confirms an effective 12 bpc color reproduction.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 94cd5ddd67ef..1a6e90e20f10 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -4563,6 +4563,14 @@ fill_dc_plane_info_and_addr(struct amdgpu_device *adev, case DRM_FORMAT_ABGR16161616F: plane_info->format = SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F; break; + case DRM_FORMAT_XRGB16161616: + case DRM_FORMAT_ARGB16161616: + plane_info->format = SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616; + break; + case DRM_FORMAT_XBGR16161616: + case DRM_FORMAT_ABGR16161616: + plane_info->format = SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616; + break; default: DRM_ERROR( "Unsupported screen format %s\n", @@ -6541,6 +6549,10 @@ static const uint32_t rgb_formats[] = { DRM_FORMAT_XBGR2101010, DRM_FORMAT_ARGB2101010, DRM_FORMAT_ABGR2101010, + DRM_FORMAT_XRGB16161616, + DRM_FORMAT_XBGR16161616, + DRM_FORMAT_ARGB16161616, + DRM_FORMAT_ABGR16161616, DRM_FORMAT_XBGR8888, DRM_FORMAT_ABGR8888, DRM_FORMAT_RGB565,
On Fri, Mar 19, 2021 at 10:03:12PM +0100, Mario Kleiner wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
We should also add support for these formats into igt.a Should be semi-easy by just adding the suitable float<->uint16 conversion stuff.
On Mon, Mar 22, 2021 at 4:52 PM Ville Syrjälä ville.syrjala@linux.intel.com wrote:
On Fri, Mar 19, 2021 at 10:03:12PM +0100, Mario Kleiner wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
We should also add support for these formats into igt.a Should be semi-easy by just adding the suitable float<->uint16 conversion stuff.
Hi Ville,
Could you point me to a specific test case / file that I should look at for adding this?
thanks, -mario
-- Ville Syrjälä Intel
On Fri, Apr 16, 2021 at 06:27:23PM +0200, Mario Kleiner wrote:
On Mon, Mar 22, 2021 at 4:52 PM Ville Syrjälä ville.syrjala@linux.intel.com wrote:
On Fri, Mar 19, 2021 at 10:03:12PM +0100, Mario Kleiner wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
We should also add support for these formats into igt.a Should be semi-easy by just adding the suitable float<->uint16 conversion stuff.
Hi Ville,
Could you point me to a specific test case / file that I should look at for adding this?
lib/igt_fb.c is the main thing. It has a bunch of conversion magic to support rendering into all kinds of weird framebuffer formats via cairo.
In this should be mostly a matter of adding convert_uint16_to_float() and convert_float_to_uint16(), plugging those into fb_convert(), and declaring the new formats in format_desc[]. There might be a few little extra details I'm forgetting though.
Once igt_fb has the required stuff kms_plane/pixel-format* should automagically pick it up if the kernel reports the format as supported.
Oh, and you need some >1.17 version of cairo for the float support.
Friendly ping to the AMD people. Nicholas, Harry, Alex, any feedback? Would be great to get this in sooner than later.
Thanks and have a nice weekend, -mario
On Fri, Mar 19, 2021 at 10:03 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
My main motivation for this is squeezing every bit of precision out of the hardware for scientific and medical research applications, where fp16 in the unorm range is limited to ~11 bpc effective linear precision in the upper half [0.5;1.0] of the unorm range, although the hardware could do at least 12 bpc.
It has been successfully tested on AMD RavenRidge (DCN-1), and with Polaris11 (DCE-11.2). Up to two displays were active on RavenRidge (DP 2560x1440@144Hz + HDMI 2560x1440@120Hz), the maximum supported on my hw, both running at 10 bpc DP output depth.
Up to three displays were active on the Polaris (DP 2560x1440@144Hz + 2560x1440@100Hz USB-C DP-altMode-to-HDMI converter + eDP 2880x1800@60Hz Apple Retina panel), all running at 10 bpc output depth.
No malfunctions, visual artifacts or other oddities were observed (apart from an adventureous mess of cables and adapters on my desk), suggesting it works.
I used my automatic photometer measurement procedure to verify the effective output precision of 10 bpc DP native signal + spatial dithering in the gpu as enabled by the amdgpu driver. Results show the expected 12 bpc precision i hoped for -- the current upper limit for AMD display hw afaik.
So it seems to work in the way i hoped :).
Some open questions wrt. AMD DC, to be addressed in this patch series, or follow up patches if neccessary:
- For the atomic check for plane scaling, the current patch will
apply the same hw limits as for other rgb fixed point fb's, e.g., for 8 bpc rgb8. Is this correct? Or would we need to use the fp16 limits, because this is also a 64 bpp format? Or something new entirely?
I haven't added the new fourcc to the DCC tables yet. Should i?
I had to change an assert for DCE to allow 36bpp linebuffers (patch 4/5).
It looks to me as if that assert was inconsistent with other places in the driver where COLOR_DEPTH121212 is supported, and looking at the code, the change seems harmless. At least on DCE-11.2 the change didn't cause any noticeable (by myself) or measurable (by my equipment) problems on any of the 3 connected displays.
- Related to that change, while i needed to increase lb pixelsize to 36bpp
to get > 10 bpc effective precision on DCN, i didn't need to do that on DCE. Also no change of lb pixelsize was needed on either DCN or DCe to get > 10 bpc precision for fp16 framebuffers, so something seems to behave differently for floating point 16 vs. fixed point 16. This all seems to suggest one could leave lb pixelsize at the old 30 bpp value on at least DCE-11.2 and still get the > 10 bpc precision if one wanted to avoid the changes of patch 4/5.
Thanks, -mario
On Fri, Apr 16, 2021 at 12:29 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Friendly ping to the AMD people. Nicholas, Harry, Alex, any feedback? Would be great to get this in sooner than later.
No objections from me.
Alex
Thanks and have a nice weekend, -mario
On Fri, Mar 19, 2021 at 10:03 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
My main motivation for this is squeezing every bit of precision out of the hardware for scientific and medical research applications, where fp16 in the unorm range is limited to ~11 bpc effective linear precision in the upper half [0.5;1.0] of the unorm range, although the hardware could do at least 12 bpc.
It has been successfully tested on AMD RavenRidge (DCN-1), and with Polaris11 (DCE-11.2). Up to two displays were active on RavenRidge (DP 2560x1440@144Hz + HDMI 2560x1440@120Hz), the maximum supported on my hw, both running at 10 bpc DP output depth.
Up to three displays were active on the Polaris (DP 2560x1440@144Hz + 2560x1440@100Hz USB-C DP-altMode-to-HDMI converter + eDP 2880x1800@60Hz Apple Retina panel), all running at 10 bpc output depth.
No malfunctions, visual artifacts or other oddities were observed (apart from an adventureous mess of cables and adapters on my desk), suggesting it works.
I used my automatic photometer measurement procedure to verify the effective output precision of 10 bpc DP native signal + spatial dithering in the gpu as enabled by the amdgpu driver. Results show the expected 12 bpc precision i hoped for -- the current upper limit for AMD display hw afaik.
So it seems to work in the way i hoped :).
Some open questions wrt. AMD DC, to be addressed in this patch series, or follow up patches if neccessary:
- For the atomic check for plane scaling, the current patch will
apply the same hw limits as for other rgb fixed point fb's, e.g., for 8 bpc rgb8. Is this correct? Or would we need to use the fp16 limits, because this is also a 64 bpp format? Or something new entirely?
I haven't added the new fourcc to the DCC tables yet. Should i?
I had to change an assert for DCE to allow 36bpp linebuffers (patch 4/5).
It looks to me as if that assert was inconsistent with other places in the driver where COLOR_DEPTH121212 is supported, and looking at the code, the change seems harmless. At least on DCE-11.2 the change didn't cause any noticeable (by myself) or measurable (by my equipment) problems on any of the 3 connected displays.
- Related to that change, while i needed to increase lb pixelsize to 36bpp
to get > 10 bpc effective precision on DCN, i didn't need to do that on DCE. Also no change of lb pixelsize was needed on either DCN or DCe to get > 10 bpc precision for fp16 framebuffers, so something seems to behave differently for floating point 16 vs. fixed point 16. This all seems to suggest one could leave lb pixelsize at the old 30 bpp value on at least DCE-11.2 and still get the > 10 bpc precision if one wanted to avoid the changes of patch 4/5.
Thanks, -mario
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Apr 20, 2021 at 5:25 PM Alex Deucher alexdeucher@gmail.com wrote:
On Fri, Apr 16, 2021 at 12:29 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Friendly ping to the AMD people. Nicholas, Harry, Alex, any feedback? Would be great to get this in sooner than later.
No objections from me.
I don't have any objections to merging this. Are the IGT tests available?
Alex
Alex
Thanks and have a nice weekend, -mario
On Fri, Mar 19, 2021 at 10:03 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
My main motivation for this is squeezing every bit of precision out of the hardware for scientific and medical research applications, where fp16 in the unorm range is limited to ~11 bpc effective linear precision in the upper half [0.5;1.0] of the unorm range, although the hardware could do at least 12 bpc.
It has been successfully tested on AMD RavenRidge (DCN-1), and with Polaris11 (DCE-11.2). Up to two displays were active on RavenRidge (DP 2560x1440@144Hz + HDMI 2560x1440@120Hz), the maximum supported on my hw, both running at 10 bpc DP output depth.
Up to three displays were active on the Polaris (DP 2560x1440@144Hz + 2560x1440@100Hz USB-C DP-altMode-to-HDMI converter + eDP 2880x1800@60Hz Apple Retina panel), all running at 10 bpc output depth.
No malfunctions, visual artifacts or other oddities were observed (apart from an adventureous mess of cables and adapters on my desk), suggesting it works.
I used my automatic photometer measurement procedure to verify the effective output precision of 10 bpc DP native signal + spatial dithering in the gpu as enabled by the amdgpu driver. Results show the expected 12 bpc precision i hoped for -- the current upper limit for AMD display hw afaik.
So it seems to work in the way i hoped :).
Some open questions wrt. AMD DC, to be addressed in this patch series, or follow up patches if neccessary:
- For the atomic check for plane scaling, the current patch will
apply the same hw limits as for other rgb fixed point fb's, e.g., for 8 bpc rgb8. Is this correct? Or would we need to use the fp16 limits, because this is also a 64 bpp format? Or something new entirely?
I haven't added the new fourcc to the DCC tables yet. Should i?
I had to change an assert for DCE to allow 36bpp linebuffers (patch 4/5).
It looks to me as if that assert was inconsistent with other places in the driver where COLOR_DEPTH121212 is supported, and looking at the code, the change seems harmless. At least on DCE-11.2 the change didn't cause any noticeable (by myself) or measurable (by my equipment) problems on any of the 3 connected displays.
- Related to that change, while i needed to increase lb pixelsize to 36bpp
to get > 10 bpc effective precision on DCN, i didn't need to do that on DCE. Also no change of lb pixelsize was needed on either DCN or DCe to get > 10 bpc precision for fp16 framebuffers, so something seems to behave differently for floating point 16 vs. fixed point 16. This all seems to suggest one could leave lb pixelsize at the old 30 bpp value on at least DCE-11.2 and still get the > 10 bpc precision if one wanted to avoid the changes of patch 4/5.
Thanks, -mario
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Apr 28, 2021 at 5:21 PM Alex Deucher alexdeucher@gmail.com wrote:
On Tue, Apr 20, 2021 at 5:25 PM Alex Deucher alexdeucher@gmail.com wrote:
On Fri, Apr 16, 2021 at 12:29 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Friendly ping to the AMD people. Nicholas, Harry, Alex, any feedback? Would be great to get this in sooner than later.
No objections from me.
I don't have any objections to merging this. Are the IGT tests available?
Any preference on whether I merge this through the AMD tree or drm-misc?
Alex
Alex
Alex
Thanks and have a nice weekend, -mario
On Fri, Mar 19, 2021 at 10:03 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
My main motivation for this is squeezing every bit of precision out of the hardware for scientific and medical research applications, where fp16 in the unorm range is limited to ~11 bpc effective linear precision in the upper half [0.5;1.0] of the unorm range, although the hardware could do at least 12 bpc.
It has been successfully tested on AMD RavenRidge (DCN-1), and with Polaris11 (DCE-11.2). Up to two displays were active on RavenRidge (DP 2560x1440@144Hz + HDMI 2560x1440@120Hz), the maximum supported on my hw, both running at 10 bpc DP output depth.
Up to three displays were active on the Polaris (DP 2560x1440@144Hz + 2560x1440@100Hz USB-C DP-altMode-to-HDMI converter + eDP 2880x1800@60Hz Apple Retina panel), all running at 10 bpc output depth.
No malfunctions, visual artifacts or other oddities were observed (apart from an adventureous mess of cables and adapters on my desk), suggesting it works.
I used my automatic photometer measurement procedure to verify the effective output precision of 10 bpc DP native signal + spatial dithering in the gpu as enabled by the amdgpu driver. Results show the expected 12 bpc precision i hoped for -- the current upper limit for AMD display hw afaik.
So it seems to work in the way i hoped :).
Some open questions wrt. AMD DC, to be addressed in this patch series, or follow up patches if neccessary:
- For the atomic check for plane scaling, the current patch will
apply the same hw limits as for other rgb fixed point fb's, e.g., for 8 bpc rgb8. Is this correct? Or would we need to use the fp16 limits, because this is also a 64 bpp format? Or something new entirely?
I haven't added the new fourcc to the DCC tables yet. Should i?
I had to change an assert for DCE to allow 36bpp linebuffers (patch 4/5).
It looks to me as if that assert was inconsistent with other places in the driver where COLOR_DEPTH121212 is supported, and looking at the code, the change seems harmless. At least on DCE-11.2 the change didn't cause any noticeable (by myself) or measurable (by my equipment) problems on any of the 3 connected displays.
- Related to that change, while i needed to increase lb pixelsize to 36bpp
to get > 10 bpc effective precision on DCN, i didn't need to do that on DCE. Also no change of lb pixelsize was needed on either DCN or DCe to get > 10 bpc precision for fp16 framebuffers, so something seems to behave differently for floating point 16 vs. fixed point 16. This all seems to suggest one could leave lb pixelsize at the old 30 bpp value on at least DCE-11.2 and still get the > 10 bpc precision if one wanted to avoid the changes of patch 4/5.
Thanks, -mario
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, May 4, 2021 at 9:22 PM Alex Deucher alexdeucher@gmail.com wrote:
On Wed, Apr 28, 2021 at 5:21 PM Alex Deucher alexdeucher@gmail.com wrote:
On Tue, Apr 20, 2021 at 5:25 PM Alex Deucher alexdeucher@gmail.com wrote:
On Fri, Apr 16, 2021 at 12:29 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Friendly ping to the AMD people. Nicholas, Harry, Alex, any feedback? Would be great to get this in sooner than later.
No objections from me.
I don't have any objections to merging this. Are the IGT tests available?
Any preference on whether I merge this through the AMD tree or drm-misc?
Alex
Hi Alex, in case the question is addressed to myself: I prefer whatever gets it into drm-next asap, so we can sync the drm_fourcc.h headers from drm-next to the IGT tests, libdrm, amdvlk etc.
Another thing:Unless this would still make it into the Linux 5.13 merge window, we'd also need a KMS_DRIVER_MINOR bump 41 -> 42. This way amdgpu-pro's Vulkan driver could know about the new 16 bpc pixel formats for the out of tree amdgpu-dkms package when running against older kernels.
thanks, -mario
Alex
Alex
Thanks and have a nice weekend, -mario
On Fri, Mar 19, 2021 at 10:03 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
My main motivation for this is squeezing every bit of precision out of the hardware for scientific and medical research applications, where fp16 in the unorm range is limited to ~11 bpc effective linear precision in the upper half [0.5;1.0] of the unorm range, although the hardware could do at least 12 bpc.
It has been successfully tested on AMD RavenRidge (DCN-1), and with Polaris11 (DCE-11.2). Up to two displays were active on RavenRidge (DP 2560x1440@144Hz + HDMI 2560x1440@120Hz), the maximum supported on my hw, both running at 10 bpc DP output depth.
Up to three displays were active on the Polaris (DP 2560x1440@144Hz + 2560x1440@100Hz USB-C DP-altMode-to-HDMI converter + eDP 2880x1800@60Hz Apple Retina panel), all running at 10 bpc output depth.
No malfunctions, visual artifacts or other oddities were observed (apart from an adventureous mess of cables and adapters on my desk), suggesting it works.
I used my automatic photometer measurement procedure to verify the effective output precision of 10 bpc DP native signal + spatial dithering in the gpu as enabled by the amdgpu driver. Results show the expected 12 bpc precision i hoped for -- the current upper limit for AMD display hw afaik.
So it seems to work in the way i hoped :).
Some open questions wrt. AMD DC, to be addressed in this patch series, or follow up patches if neccessary:
- For the atomic check for plane scaling, the current patch will
apply the same hw limits as for other rgb fixed point fb's, e.g., for 8 bpc rgb8. Is this correct? Or would we need to use the fp16 limits, because this is also a 64 bpp format? Or something new entirely?
I haven't added the new fourcc to the DCC tables yet. Should i?
I had to change an assert for DCE to allow 36bpp linebuffers (patch 4/5).
It looks to me as if that assert was inconsistent with other places in the driver where COLOR_DEPTH121212 is supported, and looking at the code, the change seems harmless. At least on DCE-11.2 the change didn't cause any noticeable (by myself) or measurable (by my equipment) problems on any of the 3 connected displays.
- Related to that change, while i needed to increase lb pixelsize to 36bpp
to get > 10 bpc effective precision on DCN, i didn't need to do that on DCE. Also no change of lb pixelsize was needed on either DCN or DCe to get > 10 bpc precision for fp16 framebuffers, so something seems to behave differently for floating point 16 vs. fixed point 16. This all seems to suggest one could leave lb pixelsize at the old 30 bpp value on at least DCE-11.2 and still get the > 10 bpc precision if one wanted to avoid the changes of patch 4/5.
Thanks, -mario
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Apr 28, 2021 at 11:22 PM Alex Deucher alexdeucher@gmail.com wrote:
On Tue, Apr 20, 2021 at 5:25 PM Alex Deucher alexdeucher@gmail.com wrote:
On Fri, Apr 16, 2021 at 12:29 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Friendly ping to the AMD people. Nicholas, Harry, Alex, any feedback? Would be great to get this in sooner than later.
No objections from me.
I don't have any objections to merging this. Are the IGT tests available?
Alex .
IGT Patches are out now, already r-b by Ville, cc'd to you. As mentioned in the cover letter for those, the new 16 bpc test cases on top o f IGT master for kms_plane test now work nicely on my RavenRidge, but i had to add hacks on top of kms_plane test to make it work at all on RV, ie. get it to the point where it could execute the tests for the new formats at all. Unmodified kms_plane from master doesn't even work on RV with Linux 5.8. Seems IGT is quite a bit out of date wrt. the kernel?
Things i had to do:
- Skip all tests for modifiers other than linear. --> Test requirements wrt. tiling not met. Seems all the modifier support for DCC, DCC_RETILE on Vega+ is missing from IGT so far?
- Skip test for format DRM_FORMAT_RGB565. CRC mismatch. Probably because a 5 bpc container can't represent the net 8 bpc content from the reference test image? Maybe all tests for < 8 bpc formats should be skipped?
- Skip tests for yuv planar formats with BT2020 color space: Limited range unsupported by DC, full range causes CRC mismatch.
- Problems with crc vblank count expected vs. actual for planar YUV formats.
- If the tests try to test more than the primary plane, igt_pipe_crc_start() fails to open the crtc/crc/data file with -EIO.
See the attached patch with all the needed hacks. Not sure which of these are limitations of the IGT test, and which are amdgpu bugs or hw limitations, but applying this hack-patch on top of the patches for the new formats makes kms_plane pass.
-mario
Alex
Thanks and have a nice weekend, -mario
On Fri, Mar 19, 2021 at 10:03 PM Mario Kleiner mario.kleiner.de@gmail.com wrote:
Hi,
this patch series adds the fourcc's for 16 bit fixed point unorm framebuffers to the core, and then an implementation for AMD gpu's with DisplayCore.
This is intended to allow for pageflipping to, and direct scanout of, Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM. I have patched AMD's GPUOpen amdvlk OSS driver to enable this format for swapchains, mapping to DRM_FORMAT_XBGR16161616: Link: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac...
My main motivation for this is squeezing every bit of precision out of the hardware for scientific and medical research applications, where fp16 in the unorm range is limited to ~11 bpc effective linear precision in the upper half [0.5;1.0] of the unorm range, although the hardware could do at least 12 bpc.
It has been successfully tested on AMD RavenRidge (DCN-1), and with Polaris11 (DCE-11.2). Up to two displays were active on RavenRidge (DP 2560x1440@144Hz + HDMI 2560x1440@120Hz), the maximum supported on my hw, both running at 10 bpc DP output depth.
Up to three displays were active on the Polaris (DP 2560x1440@144Hz + 2560x1440@100Hz USB-C DP-altMode-to-HDMI converter + eDP 2880x1800@60Hz Apple Retina panel), all running at 10 bpc output depth.
No malfunctions, visual artifacts or other oddities were observed (apart from an adventureous mess of cables and adapters on my desk), suggesting it works.
I used my automatic photometer measurement procedure to verify the effective output precision of 10 bpc DP native signal + spatial dithering in the gpu as enabled by the amdgpu driver. Results show the expected 12 bpc precision i hoped for -- the current upper limit for AMD display hw afaik.
So it seems to work in the way i hoped :).
Some open questions wrt. AMD DC, to be addressed in this patch series, or follow up patches if neccessary:
- For the atomic check for plane scaling, the current patch will
apply the same hw limits as for other rgb fixed point fb's, e.g., for 8 bpc rgb8. Is this correct? Or would we need to use the fp16 limits, because this is also a 64 bpp format? Or something new entirely?
I haven't added the new fourcc to the DCC tables yet. Should i?
I had to change an assert for DCE to allow 36bpp linebuffers (patch 4/5).
It looks to me as if that assert was inconsistent with other places in the driver where COLOR_DEPTH121212 is supported, and looking at the code, the change seems harmless. At least on DCE-11.2 the change didn't cause any noticeable (by myself) or measurable (by my equipment) problems on any of the 3 connected displays.
- Related to that change, while i needed to increase lb pixelsize to 36bpp
to get > 10 bpc effective precision on DCN, i didn't need to do that on DCE. Also no change of lb pixelsize was needed on either DCN or DCe to get > 10 bpc precision for fp16 framebuffers, so something seems to behave differently for floating point 16 vs. fixed point 16. This all seems to suggest one could leave lb pixelsize at the old 30 bpp value on at least DCE-11.2 and still get the > 10 bpc precision if one wanted to avoid the changes of patch 4/5.
Thanks, -mario
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org