[PATCH v3] drm/plane: Add documentation about software color conversion.

Mon Sep 11 10:18:15 UTC 2023

Hi

Am 11.09.23 um 10:38 schrieb Pekka Paalanen:
[...]
>> In the kernel, we reduce the copying to the changed parts, if we have
>> damage information from userspace. IDK Mesa's software renderer, but it
>> could certainly apply a similar optimization.
> 
> I have already assumed that everything uses damage information to
> optimize the regions to copy. It's still two copies instead of one.
> Actually, it's slightly more than two copies.
> 
> Due to damage tracking and the driver needing to flip between two VRAM
> buffers, it is always copying current + previous damage, not only
> current damage.

All dumb buffers are in system memory. From there, we always copy into 
the same region of VRAM. So from the kernel's perspective, it's really 
just the latest damage.

> 
>>> I suspect the intermediate buffers (dumb buffers in this case) need to
>>> be full frame size rather than a scanline, too. I'm not sure why the
>>> driver would need any scratch buffers beyond a couple dozen bytes while
>>> doing a software conversion, just enough to have the lowest common
>>> multiple of 3 bytes and optimal bus write width.
>>
>> For the conversion in the kernel we allocate enough temporary memory to
>> hold the part of each scanline that changed. Then we go over dirty
>> scanlines, convert each into the output format and store it in that
>> temporary buffer. Then copy the temporary buffer to VRAM. The buffer can
>> be reused for the scanlines of a single conversion. But the nature of
>> the framebuffer (buffers are possibly shared with concurrent access from
>> multiple planes) makes it hard to keep that temporary memory around.
>> Hence it's freed after each conversion. The code is at [1].
> 
> Yes, that's the conversion in the kernel. However, before the kernel
> can run that conversion, userspace must have already allocated full
> sized dumb buffers to convert its full sized internal framebuffer into.
> Since KMS is modernly used with double-buffering, there must always be
> two dumb buffers, which means updating a dumb buffer needs to copy not
> just current damage but also previous damage. Userspace has no way to
> know that single-buffering would be equally good in this case for this
> particular driver.

I think that this would be a good optimization. If we let userspace know 
that a certain GEM BO is a shadow buffer, the compositor could avoid 
that second dumb buffer entirely. We have plenty of DRM drivers that 
would benefit from this (grep the DRM code for 'shadow_plane').

It would also allow a format conversion in userspace with little memory 
overhead compared to the two dumb buffers of the current code.

> 
> If userspace gave its internal framebuffer to the kernel without doing
> the conversion to RGB888, then that internal buffer would be the dumb
> buffer, and there would be no need to allocate a second full size
> buffer (third, because double-buffering towards KMS).
> 
> The driver allocating even full scanlines would be a lot less memory
> touched if userspace didn't convert to RGB888, and if the driver used a
> tailored conversion function (it literally needs to handle only a single
> combination of input and output conditions), it wouldn't need even that
> nor to allocate and free on every conversion. I understand you do not
> want to pay the cost having such special-case code, and that's ok. It
> would just be even further optimization after getting rid of a static
> full sized buffer allocation.
> 
> 
>> I assume that a userspace software renderer could do a much better job
>> at keeping the temporary memory allocated.
> 
> I'm not sure why the kernel can't keep the temporary scanline buffer
> allocated with the CRTC. It only needs to be reallocated if it's too
> small. Sure, people probably don't want to spend time on such code.

I wanted to make that happen at some point. But it's a bit of a hassle. 
Either we pass a temporary buffer to the kernel's format-conversion code 
or we store it in one of the involved data structures, preferable 
drm_framebuffer.  The former solution complicates the caller and is 
prone to memory leaks; the latter is complicated as drm_framebuffer 
could be used concurrently.

> 
> All the above discussion is based on the assumption that dumb buffers
> are allocated in sysram.
> 
> It's fine to say you don't want to optimize, but I want to be clear on
> the exact trade-off.
> 
> The trade-offs vary greatly depending on each particular use case,
> which is why I wouldn't make a hard rule of no in-kernel conversions,
> just a rule of thumb since it's *usually* not useful or is actively
> hurting. Here we are talking about XRGB8888 which is already exempt
> from the rule of thumb, with two more special conditions: dumb buffers
> in sysram, and the performance traits of RGB888 on the specific
> hardware.

I don't like this optimization from an architectural POV. But I also 
haven't seen that it makes a difference in practice. As I said in 
another email, the current XRGB-based code is not too slow to be 
unusable in the case of a full-screen update. And as we're mostly moving 
mouse cursors around the screen, screen updates are small most of the time.

Best regards
Thomas

> 
> 
> Thanks,
> pq
> 
>>
>> Best regards
>> Thomas
>>
>> [1]
>> https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/drm_format_helper.c#L88
>>
>>>
>>>
>>> Thanks,
>>> pq
>>>    
>>>>
>>>> Best regards
>>>> Thomas
>>>>   
>>>>>
>>>>>
>>>>> Thanks,
>>>>> pq
>>>>>       
>>>>>>> + *     On most hardware, VRAM read access are slow, so when doing the software
>>>>>>> + *     conversion, the dumb buffer should be allocated in system RAM in order to
>>>>>>> + *     have decent performance.
>>>>>>> + *     Extra care should be taken when doing software conversion with
>>>>>>> + *     DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here:
>>>>>>> + *     https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/
>>>>>>>       */
>>>>>>>      
>>>>>>>      static unsigned int drm_num_planes(struct drm_device *dev)
>>>>>>>
>>>>>>> base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781
>>>>>>      
>>>>>       
>>>>   
>>>    
>>
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20230911/db6af244/attachment.sig>