[PATCH v2 25/25] drm/xlnx: Compute dumb-buffer sizes with drm_mode_size_dumb()

Sun Jan 19 20:14:43 UTC 2025

On Mon, Jan 20, 2025 at 12:26:30AM +0800, Sui Jingfeng wrote:
> On 2025/1/19 23:22, Tomi Valkeinen wrote:
> > On 19/01/2025 16:59, Sui Jingfeng wrote:
> >
> >>>>> But userspace must be able to continue allocating YUV buffers through
> >>>>> CREATE_DUMB.
> >>>>
> >>>> I think, allocating YUV buffers through CREATE_DUMB interface is just
> >>>> an *abuse* and *misuse* of this API for now.
> >>>>
> >>>> Take the NV12 format as an example, NV12 is YUV420 planar format, have
> >>>> two planar: the Y-planar and the UV-planar. The Y-planar appear first
> >>>> in memory as an array of unsigned char values. The Y-planar is followed
> >>>> immediately by the UV-planar, which is also an array of unsigned char
> >>>> values that contains packed U (Cb) and V (Cr) samples.
> >>>>
> >>>> But the 'drm_mode_create_dumb' structure is only intend to provide
> >>>> descriptions for *one* planar.
> >>>>
> >>>> struct drm_mode_create_dumb {
> >>>>      __u32 height;
> >>>>      __u32 width;
> >>>>      __u32 bpp;
> >>>>      __u32 flags;
> >>>>      __u32 handle;
> >>>>      __u32 pitch;
> >>>>      __u64 size;
> >>>> };
> >>>>
> >>>> An width x height NV12 image takes up width*height*(1 + 1/4 + 1/4) 
> >>>> bytes.
> >>>>
> >>>> So we can allocate an *equivalent* sized buffer to store the NV12 
> >>>> raw data.
> >>>>
> >>>> Either 'width * (height * 3/2)' where each pixel take up 8 bits,
> >>>> or just 'with * height' where each pixels take up 12 bits.
> >>>>
> >>>> However, all those math are just equivalents description to the original
> >>>> NV12 format, neither are concrete correct physical description.
> >>>
> >>> I don't see the problem. Allocating dumb buffers, if we don't have 
> >>> any heuristics related to RGB behind it, is essentially just 
> >>> allocating a specific amount of memory, defined by width, height and 
> >>> bitsperpixel.
> >>>
> >> I think, the problem will be that the 'width', 'height' and 'bpp'
> >> are originally used to describe one plane. Those three parameters
> >> has perfectly defined physical semantics.
> >>
> >> But with multi planar formats, take NV12 image as an example,
> >> for a 2×2 square of pixels, there are 4 Y samples but only 1 U
> >> sample and 1 V sample. This format requires 4x8+1x8+1x8=48 bits
> >> to store the 2x2 square.
> >>
> >> So its depth is 12 bits per pixel (48 / (2 * 2)).
> >>
> >> so my problem is that the mentioned 12bpp in this example only
> >> make sense in mathematics, it doesn't has a good physical
> >> interpret. Do you agree with me on this technique point?
> >>
> >>> If I want to create an NV12 framebuffer, I allocate two dumb 
> >>> buffers, one for Y and one for UV planes, and size them accordingly. 
> >>> And then create the DRM framebuffer with those.
> >>>
> >> Then how you fill the value of the 'width', 'height' and 'bpp' of 
> >> each dumb buffers?
> >
> > For 640x480-NV12:
> > plane 0: width = 640, height = 480, bpp = 8
> > plane 1: width = 640 / 2, height = 480 / 2, bpp = 16
>
> But i think this should be hardware dependent. The hardware I'm using
> load NV12  raw data as a whole. I only need to feed gpuva of the backing
> memory to the hardware register once.
> 
> Not familiar with your hardware, so I can't talk more on this software
> design. Perhaps someone know more could have a comment on this.

Layout of planes in memory is just one hardware constraint, the same way
we have constraints on alignment and strides. Some devices require the
planes to be contiguous (likely with some alignment constraints), some
can work with planes being in discontiguous pieces of memory, and even
require them to be discontiguous and located in separate DRAM banks.

> >> Why not allocate storage for the whole on one shoot?
> >
> > You can, if you adjust the parameters accordingly. However, if the 
> > strides of the planes are not equal, I guess it might cause problems 
> > on some platforms.
> >
> > But I think it's usually simpler to allocate one buffer per plane, and 
> > perhaps even better as it doesn't require as large contiguous memory 
> > area.
> >
> >> The modetest in libdrm can be an good example, send it[1] to you as 
> >> an reference.
> >
> > Right, so modetest already does it successfully. So... What is the issue?
>
> But then, the problem will become that it override the 'height' parameter.
> What's the physical interpretation of the 'height' parameter when creating
> an NV12 image with the dump API then?

I wouldn't be too concerned about physical interpretations. Yes, the
height, width and bpp parameters were likely designed with RGB formats
in mind. Yes, using DUMB_CREATE for YUV formats is probably something
that the original authors didn't envision. And yes, from that point of
view, it could be seen by the original authors as an abuse of the API.
But I don't think that's a problem as such.

An API is just an API. True, it would be nicer if the usage of the ioctl
parameters was more intuitive for YUV formats, but I believe we could
still standardize how the existing parameters map to linear scanout YUV
formats without causing the world to end. As has been said before, lots
of drivers are using DUMB_CREATE for this purpose, and we can't change
that.

This doesn't mean we shouldn't work on improving memory allocation, but
I see that as a separate issue.

> I guess, solving complex problems with simple APIs may see the limitation,
> sooner or later. But I not very sure and might be wrong. So other peoples
> can override me words.
> 
> > Everyone agrees that CREATE_DUMB is not the best ioctl to allocate 
> > buffers, and one can't consider it to work identically across the 
> > platforms. But it's what we have and what has been used for ages.
>
> Yeah, your request are not unreasonable. It can be seen as a kind of rigid demand.
> Since GEM DMA helpers doesn't export an more advanced interface to userspace so far.
> As a result, drivers that employing GEM DMA has no other choice, but to abuse the
> dumb buffer API to do allocation for the more complex format buffers.
> 
> The dumb buffer API doesn't support to specify buffer format, tile status and
> placement etc. The more advance drivers has been exposed the xxx_create_gem()
> to user-space. It seems that a few more experienced programmers hint us to
> create an new ioctl at above thread, so that we can keep employing simple API
> to do simple things and to suit complex needs with the more advanced APIs.

I'd really like to explore adding new ioctls to exposure memory
allocation constraints, and allocating the memory itself from DMA heaps.

-- 
Regards,

Laurent Pinchart