[PATCH v3] drm/plane: Add documentation about software color conversion.

Jocelyn Falempe jfalempe at redhat.com
Mon Sep 11 10:05:36 UTC 2023


On 08/09/2023 17:37, Thomas Zimmermann wrote:
> Hi
> 
> Am 08.09.23 um 16:48 schrieb Jocelyn Falempe:
>> On 08/09/2023 15:56, Thomas Zimmermann wrote:
>>> Hi
>>>
>>> Am 08.09.23 um 13:16 schrieb Pekka Paalanen:
>>>> On Fri, 8 Sep 2023 11:21:51 +0200
>>>> Thomas Zimmermann <tzimmermann at suse.de> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe:
>>>>> [...]
>>>>>> + *
>>>>>> + *     But there are two exceptions only for dumb buffers:
>>>>>> + *     * To support XRGB8888 if it's not supported by the hardware.
>>>>>
>>>>>
>>>>>> + *     * Any driver is free to modify its internal representation 
>>>>>> of the format,
>>>>>> + *       as long as it doesn't alter the visible content in any 
>>>>>> way, and doesn't
>>>>>> + *       modify the user-provided buffer. An example would be to 
>>>>>> drop the
>>>>>> + *       padding component from a format to save some memory 
>>>>>> bandwidth.
>>>>>
>>>>> I have strong objections to this point, _especially_ as you're
>>>>> apparently trying to sneak this in after our discussion. NAK on this
>>>>> part from my side.
>>>>>
>>>>> If you want userspace to be able to use a certain format, then export
>>>>> the corresponding 4cc code. Then let userspace decide what to do about
>>>>> it. If userspace pick a certain format, go with it.
>>>>
>>>> What is the reason for your objection, exactly?
>>>>
>>>>> Hence, no implicit conversion from XRGB888 to RGB888, just because 
>>>>> it's
>>>>> possible.
>>>>
>>>> For the particular driver in question though, the conversion allows
>>>> using a display resolution that is otherwise not possible. I also hear
>>>> it improves performance since 25% less data needs to travel across a
>>>> slow bus. There is also so little VRAM, than all dumb buffers need to
>>>> be allocated from sysram instead anyway, so a copy is always necessary.
>>>>
>>>> Since XRGB8888 is the one format that is recommended to be supported by
>>>> all drivers, I don't see a problem here. Did you test on your
>>>> incredibly slow g200 test rig if the conversion ends up hurting instead
>>>> of helping performance there?
>>>>
>>>> If it hurts, then I see that you have a good reason to NAK this.
>>>>
>>>> It's hard to imagine how it would hurt, since you always need a copy
>>>> from sysram dumb buffers to VRAM - or do you?
>>>
>>> I have a number of concerns. My point it not that we shouldn't 
>>> optimize. I just don't want it in the kernel. Mgag200 can export 
>>> DRM_FORMAT_RGB888 for userspace to use.
>>
>> It already does, it's just userspace doesn't want to support it.
>>>
>>> AFAICT the main argument against userspace is that Mesa doesn't like 
>>> 3-byte pixels. But I don't see how this conversion cannot be a 
>>> post-processing step within Mesa: do the rendering in RGB32 and then 
>>> convert to a framebuffer in RGB24. Userspace can do that more 
>>> efficiently than the kernel. This has all of the upsides of reduced 
>>> bandwidth, but none of the downsides of kernel code. Applications 
>>> and/or Mesa would be in control of the buffer format and apply the 
>>> optimization where it makes sense. And it would be available for all 
>>> drivers that are similar to mgag200.
>>
>> For this particular case, user-space would use more memory and CPU, 
>> because the copy to VRAM is done on kernel side, and it is where the 
>> conversion must be done for maximum performances. So there is no way 
>> for userspace to be as efficient as the kernel in this use-case.
>>
>> For the conversion, the kernel allocate only 1 line, and convert/copy 
>> one line at a time. And because the CPU is out-of-order, it can start 
>> converting the next line using CPU registers while waiting for the bus.
> 
> Access is writecombined, so you cannot throw large amounts of data 
> towards the bus and do something else meanwhile.
> 
>>
>> Userspace would need to allocate a whole framebuffer, and can't use 
>> the "wasted" CPU cycle to do the conversion.
> 
> Yes, userspace would probably need a full extra framebuffer to store the 
> RGB32 data. But just as in the kernel, userspace can do format 
> conversion of only the damaged areas. No extra CPU overhead here.
> 
>>
>>>
>>> My main point is simplicity of the driver: I prefer the driver to be 
>>> simple without unnecessary indirection or overhead. Optimizations 
>>> like these my or may not work on a given system with a certain 
>>> workload. I'd better leave this heuristic to userspace.
>>
>> I agree that the driver is simpler without optimizing thing. But I 
>> think it's the role of the driver to get the maximum from the 
>> hardware, even if it's old and broken like g200. And userspace don't 
>> want to optimize for such hardware.
> 
> Optimization always depends on the workload; something that the driver 
> doesn't know. For example, as we mostly move the mouse cursor around the 
> screen, the damages areas are usually small. Optimizing this might be 
> pointless in any case.

So your point is we should not optimize because sometime it might not be 
necessary ? And even for cursor update, the conversion is still 25% faster.

> 
>>
>>>
>>> Another point of concern is CPU consumption: Slow I/O buses may stall 
>>> the display thread, but the CPU could do something else in the 
>>> meantime. Doing format conversion on the CPU prevents that, hence 
>>> affecting other parts of the system negatively. Of course, that's 
>>> more of a gut feeling than hard data.
>>
>> I think it's the reverse. Without dropping the X data, the CPU is 
>> actually stalling much longer sending useless bytes to the mgag200's 
>> VRAM. And the CPU can't do anything else while doing memcpy_toio().
> 
> Hyperthreading.

I still doubt a user-space conversion would do a better job than the kernel.
> 
> You are also arguing against XRGB in general, which is a different topic.

yes, the issue is human eyes only sees 3 colors, and it's not a power of 
two. So compromise have been made, and that Matrox card, is from the era 
of the transition from 16bits to 32bits, and works significantly better 
in 24bits. And it's probably the only remaining GPU with this problem.

> 
>> Using DMA is the only way to free the CPU during the copy, but it 
>> appears to be either broken or significantly slower on most mgag200 
>> hardware.
>>
>> I actually have made the work to support DMA, and I'm a bit sad it 
>> didn't work on all g200, so this optimization is really a last resort, 
>> on a really broken hardware.
>>
>>>
>>> Please note that the kernel's conversion code uses memory allocation 
>>> of intermediate buffers. We even recently had a discussion about 
>>> allocation overhead during display updates. Userspace can surely do a 
>>> better job at keeping such buffers around.
>>>
>>> And finally a note the hardware itself: on low-end hardware like 
>>> those Matrox chips, just switch to RGB16. That will be pretty and 
>>> fast enough for these chips' server systems. Anyone who cares about 
>>> fast and beautiful should buy a real graphics card.
>>
>> There are still sysadmin users that occasionally have to browse the 
>> web to find answer, or read their mail in a GUI client. It turns out 
>> that rgb16 is pretty ugly for today standard, and buying an external 
>> card would be a bit too much, and won't work for remote control.
> 
> I'm sure sysadmins have a computer for work with a decent GPU and don't 
> need to browse the web on their server systems.

The GUI applications also include graphical installer, that obviously 
you can't run on other system.
I do have bug reports, and I already fixed a few regressions in the 
mgag200 driver from this reports.
But if you think they shouldn't use this GPU, then why maintaining a 
driver in the first place ? Simpledrm is enough if you don't use graphics.

> 
> Best regards
> Thomas
> 
>>
>>
>> Best regards,
>>
> 

Best regards,

-- 

Jocelyn






More information about the dri-devel mailing list