[PATCH v6 18/44] drm/vkms: Use s32 for internal color pipeline precision

Wed Dec 18 21:12:06 UTC 2024

On 2024-10-04 07:43, Louis Chauvet wrote:
> On 03/10/24 - 16:01, Harry Wentland wrote:
>> Certain operations require us to preserve values below 0.0 and
>> above 1.0 (0x0 and 0xffff respectively in 16 bpc unorm). One
>> such operation is a BT709 encoding operation followed by its
>> decoding operation, or the reverse.
>>
>> We'll use s32 values as intermediate in and outputs of our
>> color operations, for the operations where it matters.
>>
>> For now this won't apply to LUT operations. We might want to
>> update those to work on s32 as well, but it's unclear how
>> that should work for unorm LUT definitions. We'll revisit
>> that once we add LUT + CTM tests.
>>
>> In order to allow for this we'll also invert the nesting of our
>> colorop processing loops. We now use the pixel iteration loop
>> on the outside and the colorop iteration on the inside.
>>
>> Signed-off-by: Harry Wentland <harry.wentland at amd.com>
>> ---
>> v6:
>>  - use clamp_val instead of manual clamping (Louis Chauvet)
> 
> Perfect!
>  
>> v4:
>>  - Clarify that we're packing 16-bit UNORM into s32, not
>>    converting values to a different representation (Pekka)
>>
>> v3:
>>  - Use new colorop->next pointer
>>
>>  drivers/gpu/drm/vkms/vkms_composer.c | 27 +++++++++++++++++++++++++--
>>  drivers/gpu/drm/vkms/vkms_drv.h      |  4 ++++
>>  2 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index b4aaad2bf45f..01fa81ebc93b 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -159,7 +159,7 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
>>  	}
>>  }
>>
>> -static void apply_colorop(struct pixel_argb_u16 *pixel, struct drm_colorop *colorop)
>> +static void apply_colorop(struct pixel_argb_s32 *pixel, struct drm_colorop *colorop)
> 
> I agree with this change, but I think we may face conversion issues 
> in apply_lut_to_channel_value, as it takes u16 and returns u16, but with 
> your change, you pass s32 and expect s32.
> 

apply_lut() still passes and expects a u16.

apply_colorop() should be fine with passing/getting u16 values to
apply_lut_to_channel_value(). The only way I could see this become
an issue is if someone uses a CTM that results in negative or greater
than 16-bit (1.0f) values that then feed into a LUT. But I'm not
sure how realistic such a use-case is and would rather address this
when we see a need for it. An IGT test that creates such a scenario
would fail, but we don't have such a test currently.

I also don't have time to dig into it, rework apply_lut_to_channel_value
and do correct clipping where needed and I'd rather not touch it unless
I have time to be thorough with it.

So, I think it's fine for now but we might want to rework it at some
point.

Harry

> If it is not an issue: Reviewed-by: Louis Chauvet <louis.chauvet at bootlin.com>
> 
>>  {
>>  	struct drm_colorop_state *colorop_state = colorop->state;
>>
>> @@ -185,9 +185,26 @@ static void apply_colorop(struct pixel_argb_u16 *pixel, struct drm_colorop *colo
>>
>>  static void pre_blend_color_transform(const struct vkms_plane_state *plane_state, struct line_buffer *output_buffer)
>>  {
>> +	struct pixel_argb_s32 pixel;
>> +
>>  	for (size_t x = 0; x < output_buffer->n_pixels; x++) {
>>  		struct drm_colorop *colorop = plane_state->base.base.color_pipeline;
>>
>> +		/*
>> +		 * Some operations, such as applying a BT709 encoding matrix,
>> +		 * followed by a decoding matrix, require that we preserve
>> +		 * values above 1.0 and below 0.0 until the end of the pipeline.
>> +		 *
>> +		 * Pack the 16-bit UNORM values into s32 to give us head-room to
>> +		 * avoid clipping until we're at the end of the pipeline. Clip
>> +		 * intentionally at the end of the pipeline before packing
>> +		 * UNORM values back into u16.
>> +		 */
>> +		pixel.a = output_buffer->pixels[x].a;
>> +		pixel.r = output_buffer->pixels[x].r;
>> +		pixel.g = output_buffer->pixels[x].g;
>> +		pixel.b = output_buffer->pixels[x].b;
>> +
>>  		while (colorop) {
>>  			struct drm_colorop_state *colorop_state;
>>
>> @@ -197,10 +214,16 @@ static void pre_blend_color_transform(const struct vkms_plane_state *plane_state
>>  				return;
>>
>>  			if (!colorop_state->bypass)
>> -				apply_colorop(&output_buffer->pixels[x], colorop);
>> +				apply_colorop(&pixel, colorop);
>>
>>  			colorop = colorop->next;
>>  		}
>> +
>> +		/* clamp values */
>> +		output_buffer->pixels[x].a = clamp_val(pixel.a, 0, 0xffff);
>> +		output_buffer->pixels[x].r = clamp_val(pixel.r, 0, 0xffff);
>> +		output_buffer->pixels[x].g = clamp_val(pixel.g, 0, 0xffff);
>> +		output_buffer->pixels[x].b = clamp_val(pixel.b, 0, 0xffff);
>>  	}
>>  }
>>
>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>> index 278cf3533f58..b78bc2611746 100644
>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>> @@ -36,6 +36,10 @@ struct vkms_frame_info {
>>  	unsigned int cpp;
>>  };
>>
>> +struct pixel_argb_s32 {
>> +	s32 a, r, g, b;
>> +};
>> +
>>  struct pixel_argb_u16 {
>>  	u16 a, r, g, b;
>>  };
>> --
>> 2.46.2
>>